Pinecone: A Managed Vector Database

Vectors, the output data format of Neural Network models, can effectively encode information and serve a pivotal role in AI applications such as knowledge base, semantic search, Retrieval Augmented Generation (RAG) and more.

Pinecone is a managed, cloud-native vector database designed for high performance and scalability. In this guide, we will walk you through how to use Pinecone to store and search vectors in the cloud.

Where Chapter 8’s from-scratch notebook kept everything in a Pandas DataFrame, Pinecone pushes the vectors to a managed cloud index — same retrieval logic, very different operational footprint (API keys, regions, per-request latency).

Install Dependencies¶

First, we will install the required libraries: pinecone-client, sentence-transformers, pandas, and python-dotenv.

!pip install pinecone sentence-transformers pandas python-dotenv

Initialize Pinecone Client¶

To use Pinecone, you need an API key from the Pinecone console. Create yet another account in Pinecone and generate an API key that can be added to the google colab secrets.

Importing Pinecone and Environment Setup¶

We import the Pinecone client along with dotenv for managing environment variables. Keeping the API key in a secret store rather than in code is essential for security.

import os
from dotenv import load_dotenv
from pinecone import Pinecone, ServerlessSpec
from google.colab import userdata

Authenticating with Pinecone¶

The Pinecone API key is loaded from Colab secrets and passed to the client constructor. We also set an optional Hugging Face token for the sentence-transformer model used later.

os.environ["PINECONE_API_KEY"] = userdata.get('PINECONE_KEY')

### Creating and using the HUGGINGFACE_TOKEN is optional (as of now in colab), but I would recommend that you do it, as it is used whenever SENTENCE TRANSFORMERS  models will be called.
os.environ["HF_TOKEN"] = userdata.get('HUGGINGFACE_TOKEN')

# Initialize the Pinecone client
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

Set Up Embedding Model¶

We will use sentence-transformers to embed the text data. This model generates 384-dimensional embeddings.

Loading the Embedding Model¶

We use all-MiniLM-L6-v2 from sentence-transformers, the same model used in the LanceDB example. Consistency in the embedding model is important because vectors from different models are not comparable.

from sentence_transformers import SentenceTransformer

# Initialize the embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

Create a Pinecone Index¶

We need an index in Pinecone to store our vectors. The dimension parameter must match the size of embeddings generated by our model (384-dimensions).

Creating a Pinecone Index¶

An index in Pinecone is the container for your vectors. The dimension must match the embedding model output (384), and we choose cosine similarity as the distance metric. The serverless spec means Pinecone manages all infrastructure for you.

index_name = "is-research"

# Check if the index exists, and if not, create it
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=384, # Match the 'all-MiniLM-L6-v2' output dimension
        metric='cosine',
        spec=ServerlessSpec(
            cloud='aws',
            region='us-east-1'
        )
    )

# Connect to the index
index = pc.Index(index_name)

Prepare and Load Data¶

Load our existing dataset, ISResearch.csv, into a Pandas DataFrame. Since Pinecone operates on the cloud, we will embed the abstracts locally and push batches of (id, vector, metadata) directly to the index.

Embedding and Upserting Data¶

Unlike LanceDB, Pinecone requires you to generate embeddings yourself before uploading. We encode all abstracts locally, package each vector with its metadata, and upsert in batches of 100 to stay within payload limits.

import pandas as pd

# Load your CSV
df = pd.read_csv("https://raw.githubusercontent.com/KarAnalytics/datasets/refs/heads/master/ISResearch.csv")

df.head()

# Generate embeddings for all abstracts
abstracts = df['Abstract'].tolist()
embeddings = model.encode(abstracts)

# Prepare data in the format Pinecone requires: a list of tuples (id, vector, metadata_dict)
vectors_to_upsert = []
for i, row in df.iterrows():
    vectors_to_upsert.append({
        "id": str(row['id']),
        "values": embeddings[i].tolist(),
        "metadata": {
            "Title": row['Title'],
            "Year": row['Year'],
            "URL": row['URL'],
            "Journal": row['JournalFN']
        }
    })

# Upsert (insert or update) the vectors into Pinecone in batches to avoid payload limits
batch_size = 100
for i in range(0, len(vectors_to_upsert), batch_size):
    batch = vectors_to_upsert[i:i + batch_size]
    index.upsert(vectors=batch)

print(f"Migration Complete. Upserted {len(vectors_to_upsert)} vectors to Pinecone.")

Querying the Index¶

To search, we embed the query with the same model and send the vector to Pinecone. The top_k parameter controls how many nearest neighbors to return, and include_metadata=True brings back the paper titles and URLs alongside the similarity scores.

Semantic Search¶

Now we can perform semantic searches. First, we generate an embedding for our search query and then we send it to Pinecone to retrieve the top matches along with their metadata.

query = "Which papers mention blockchain or decentralized systems?"

# Embed the query text
query_vector = model.encode([query])[0].tolist()

# Search the Pinecone index
results = index.query(
    vector=query_vector,
    top_k=5,
    include_metadata=True # Ensure we retrieve the Title, Year, and URL
)

print("\n--- Top 5 Semantic Search Results ---")
for match in results['matches']:
    metadata = match['metadata']
    print(f"Score: {match['score']:.4f} | Year: {metadata.get('Year')} | Title: {metadata.get('Title')}")

# Optional: To show the URL of the top result
top_match = results['matches'][0]['metadata']
print(f"\nTop Match URL: {top_match.get('URL')}")

Key takeaways¶

Pinecone is a managed cloud vector database -- you get an API key, create an index, and Pinecone handles infrastructure, scaling, and availability for you.
Index dimension must match the embedding model output (384 for all-MiniLM-L6-v2), and the metric (cosine here) is fixed at index creation time.
Manual embedding is required -- unlike LanceDB’s auto-embed, Pinecone expects you to encode text locally and upsert (id, vector, metadata) tuples in batches.
Metadata travels with each vector and is returned at query time via include_metadata=True, enabling filter-plus-semantic hybrid queries without a second lookup.
Serverless spec on AWS means no servers to provision, but the tradeoff is API keys, network latency, and per-request cost versus a local embedded store.

Run the code¶

To run this notebook, copy the URL below into your browser’s address bar. The link opens the notebook directly in Google Colab. (If your PDF viewer makes the URL clickable and lands on a broken page, copy the full text manually -- the viewer may have truncated the link at a line break.)

Estimated run time: ~5 minutes (requires Pinecone API key)

https://colab.research.google.com/github/KarAnalytics/code_demos/blob/main/Pinecone_vectorDB.ipynb