Skip to main content

How to build a custom embedder in LlamaIndex: AWS Titan Multimodal example

· 14 min read
Norah Sakal
AI Consultant & Developer

How to build a custom embedder in LlamaIndex: AWS Titan Multimodal example

LlamaIndex makes it easy to build AI-powered search, but if you're working with multimodal embeddings (text + images), like the AWS Titan multimodal model, you'll notice it's not natively supported.

For e-commerce search, I need embeddings that capture both product descriptions and images to generate more accurate search results.

This guide will show you how to override LlamaIndex's default embedder to use AWS Titan Multimodal.

Why use a custom embedding model?

LlamaIndex supports AWS Bedrock, but not Titan Multimodal yet. Here's what happens if we try to use the default embedder with Titan:

results = retriever.retrieve("red shoes")

❌ Error: Mismatched vector dimensions

Error

You'll see this error in your Jupyter Notebook:

http://localhost:3000/notebooks/jupyter_notebook.ipynb
results = retriever.retrieve("red shoes")
PineconeApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Date': 'Tue, 04 Mar 2025 17:46:52 GMT', 'Content-Type': 'application/json', 'Content-Length': '104', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '90', 'x-pinecone-request-id': '3238657866182661992', 'x-envoy-upstream-service-time': '2', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"Vector dimension 1536 does not match the dimension of the index 1024","details":[]}

Why Does This Happen?

By default, LlamaIndex uses OpenAI's text-embedding-ada-002 with 1536 dimensions, while AWS Titan generates 1024-dimensional vectors.

Solution: Override LlamaIndex's BaseEmbedding class with a custom multimodal embedder.

Step 1: Install required packages

To set up AWS Titan, LlamaIndex, and Pinecone, install the following:

Terminal
pip install boto3 llama-index pinecone-client pinecone-text ipython

Then, import the necessary libraries:

import base64
from IPython.display import display, Image, HTML
import json
import os
from typing import Any, List, Optional

import boto3

# LlamaIndex
from llama_index.core import VectorStoreIndex
from llama_index.core.embeddings import BaseEmbedding
from llama_index.core.schema import QueryBundle

# LlamaIndex vector stores
from llama_index.vector_stores.pinecone import PineconeVectorStore

# LlamaIndex retrievers
from llama_index.core.retrievers import VectorIndexRetriever

# Pinecone
from pinecone import Pinecone, ServerlessSpec
from pinecone_text.sparse import BM25Encoder

Step 2: Set Up AWS Bedrock client

We initialize the AWS Bedrock client to connect with AWS:

# Define your AWS profile 
# Replace 'your-profile-name' with the name of your AWS CLI profile
# To use your default AWS profile, leave 'aws_profile' as None
aws_profile = os.environ.get('AWS_PROFILE')

# Specify the AWS region where Bedrock is available
aws_region_name = "us-east-1"

try:
# Set the default session for the specified profile
if aws_profile:
boto3.setup_default_session(profile_name=aws_profile)
else:
boto3.setup_default_session() # Use default AWS profile if none is specified

# Initialize the Bedrock runtime client
bedrock_runtime = boto3.client(
service_name="bedrock-runtime",
region_name=aws_region_name
)
except NoCredentialsError:
print("AWS credentials not found. Please configure your AWS profile.")
except Exception as e:
print(f"An unexpected error occurred: {e}")

Step 3: Define a function to generate custom embeddings

AWS Titan requires both text and image inputs in a specific format:

def encode_image(image_path: str) -> str:
"""
Convert an image file to a Base64 string.
"""
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")

def request_embedding(image_path=None, text_description=None):
"""
Request embeddings from AWS Titan multimodal model.

Parameters:
image_base64 (str, optional): Base64 encoded image string.
text_description (str, optional): Text description.

Returns:
list: Embedding vector.
"""
image_base64 = encode_image(image_path) if image_path else None
input_data = {"inputImage": image_base64, "inputText": text_description}
body = json.dumps(input_data)

# Invoke the Titan multimodal model
response = bedrock_runtime.invoke_model(
body=body,
modelId="amazon.titan-embed-image-v1",
accept="application/json",
contentType="application/json"
)

response_body = json.loads(response.get("body").read())

if response_body.get("message"):
raise ValueError(f"Embeddings generation error: {response_body.get('message')}")

return response_body.get("embedding")

If we call this function with a text snippet:

request_embedding(text_description='red shoes')

We'll receive a 1024 long vector from AWS Bedrock:

[0.043701172, 0.032958984, -0.033935547, 0.0033569336, -0.006866455, 0.01953125, 0.09765625, ..., ]

Step 4: Create a custom embedding class

The default BaseEmbedding class in LlamaIndex only supports text. We'll override it to handle both text and images using AWS Titan Multimodal.

We'll build the class step by step, then show the full implementation at the end.

The default BaseEmbedding class has 5 functions we need to override:

main.py
async def _aget_query_embedding()

async def _aget_text_embedding()

def _get_query_embedding()

def _get_text_embedding()

def _get_text_embeddings()

What our class will do

  1. Override LlamaIndex's BaseEmbedding class to use Titan
  2. Define methods for single & batch embeddings
  3. Handle both synchronous & async requests

Step 4.1: Create the class skeleton
First, define the class and register it in LlamaIndex:

class MultimodalEmbeddings(BaseEmbedding):
"""
Custom embedding class for AWS Titan multimodal embeddings.
"""

def __init__(self, **kwargs: Any) -> None:
super().__init__(**kwargs)

@classmethod
def class_name(cls) -> str:
return "multimodal"

What this does:

  • Inherits from BaseEmbedding, so it works within LlamaIndex
  • Registers itself under the "multimodal" class name

Step 4.2: Add synchronous embedding methods
Next, we override the synchronous functions to request embeddings for queries and text:

def _get_query_embedding(self, query: str, image_path: Optional[str] = None) -> List[float]:
"""
Get embeddings for a query string and optional image.
"""
return request_embedding(image_path=image_path, text_description=query)

What this does:

  • Calls request_embedding() with the query text
  • Returns a 1024-dimensional vector from AWS Titan

Now, let's add the text embedding function:

def _get_text_embedding(self, text: str, image_path: Optional[str] = None) -> List[float]:
"""
Get embeddings for a text string and optional image.
"""
return request_embedding(image_path=image_path, text_description=text)

What this does:

  • Similar to _get_query_embedding, but used when embedding document text instead of search queries

Finally, we add support for batch text embeddings:

def _get_text_embeddings(self, texts: List[str], image_paths: Optional[List[str]] = None) -> List[List[float]]:
"""
Get embeddings for a batch of text strings with optional images.
"""
image_paths = image_paths or [None] * len(texts) # Ensure image list matches text list length
return [request_embedding(image_path=img, text_description=txt) for txt, img in zip(texts, image_paths)]

What this does:

  • Loops through a list of texts and returns embeddings for each
  • Useful when processing multiple product descriptions at once

Step 4.3: Add asynchronous methods
LlamaIndex supports async embedding requests, so we need to mirror the previous methods with async versions:

async def _aget_query_embedding(self, query: str, image_path: Optional[str] = None) -> List[float]:
return self._get_query_embedding(query, image_path)

async def _aget_text_embedding(self, text: str, image_path: Optional[str] = None) -> List[float]:
return self._get_text_embedding(text, image_path)

What this does:

  • These async methods call the synchronous versions, ensuring compatibility with LlamaIndex async workflows

Step 4.4: Full Class Implementation
Now that we've built it step by step, here's the complete class:

class MultimodalEmbeddings(BaseEmbedding):
"""
Custom embedder for AWS Titan multimodal model.
Supports both text and image inputs.
"""

def __init__(self, **kwargs):
super().__init__(**kwargs)

@classmethod
def class_name(cls):
return "multimodal"

def _get_query_embedding(self, query: str, image_path: Optional[str] = None) -> List[float]:
"""
Get embeddings for a query string and optional image.
"""
return request_embedding(image_path=image_path, text_description=query)

def _get_text_embedding(self, text: str, image_path: Optional[str] = None) -> List[float]:
"""
Get embeddings for a text string and optional image.
"""
return request_embedding(image_path=image_path, text_description=text)

def _get_text_embeddings(self, texts: List[str], image_paths: Optional[List[str]] = None) -> List[List[float]]:
"""
Get embeddings for a batch of text strings with optional images.
"""
image_paths = image_paths or [None] * len(texts) # Ensure image list matches text list length
return [request_embedding(image_path=img, text_description=txt) for txt, img in zip(texts, image_paths)]

async def _aget_query_embedding(self, query: str, image_path: Optional[str] = None) -> List[float]:
return self._get_query_embedding(query, image_path)

async def _aget_text_embedding(self, text: str, image_path: Optional[str] = None) -> List[float]:
return self._get_text_embedding(text, image_path)

Now, LlamaIndex can use AWS Titan Multimodal as a custom embedder.

Step 5: Initialize the custom embedder

Now, let's initialize the custom MultimodalEmbeddings class:

# Instantiate the custom embedding model
embed_model = MultimodalEmbeddings()

Initialize Pinecone

Now, let's initialize your Pinecone index:

# Initialize Pinecone client with API key
pc = Pinecone(api_key=os.environ['PINECONE_API_KEY'])
index_name = "YOUR_PINECONE_INDEX" # Replace with your desired index name
pinecone_index = pc.Index(index_name)

Set up vector store

vector_store = PineconeVectorStore(
pinecone_index=pinecone_index,
add_sparse_vector=True # Enables hybrid search
)

Create vector index

We'll then need a vector index that allows us to query the Pinecone index using the vector store we just initialized:

# Create a Vector Store Index
vector_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

Step 7: Create a custom retriever

The default VectorIndexRetriever in LlamaIndex is designed for text-only embeddings.

Since AWS Titan generates a single vector for both text and images, we need to modify the retriever to:

  • Generate embeddings using both text & image
  • Use Titan's multimodal model instead of the default embedder
  • Keep LlamaIndex's automated Pinecone query mechanism

Define a custom retriever

We'll override the _retrieve() function so it automatically queries Pinecone with multimodal embeddings:

class TitanMultimodalRetriever(VectorIndexRetriever):
"""
Custom retriever for AWS Titan multimodal embeddings.
Uses a single Pinecone index for both text & image queries.
"""

def _retrieve(self, query_bundle: QueryBundle) -> List:
"""
Overrides retrieval to use AWS Titan’s multimodal embedding.
"""
# Generate the multimodal embedding using Titan
query_bundle.embedding = request_embedding(
text_description=query_bundle.query_str,
image_path=query_bundle.image_path if hasattr(query_bundle, 'image_path') else None
)

# Pass the embedding to LlamaIndex's default retrieval process
return super()._retrieve(query_bundle)

How this works:

  • Intercepts the retrieval request
  • Uses Titan to generate a 1024-dim vector (text + image)
  • Passes the embedding to LlamaIndex's default Pinecone query process
  • No manual Pinecone queries needed

Define a retriever

Let's define a simple retriever which uses our custom embedding model:

# Create a simple retriever
retriever = TitanMultimodalRetriever(
index=vector_index,
embed_model=embed_model, # Your custom embedding model
similarity_top_k=5, # Retrieve the top 5 results
vector_store_query_mode="hybrid", # Enable hybrid search
alpha=0.5 # Weighting between semantic and keyword search
)

Step 8: Run a query

Now that our retriever supports text + image queries, we can retrieve results using both modalities.

Example 1: Query with text only

# Create a query bundle
query_bundle = QueryBundle(query_str="red shoes")

# Query the vector store with query bundle
results = retriever.retrieve(query_bundle)

# Display results
for item in results:
score = item.score
print(f"Score: {score:.4f}")
print(f"Text: {item.get_content()}")
print("-" * 50)

You should see this output in you Jupyter Notebook:

http://localhost:3000/notebooks/jupyter_notebook.ipynb
Score: 2.2923
Text: Id men red shoes
--------------------------------------------------
Score: 2.2458
Text: Arrow men red shoes
--------------------------------------------------
Score: 2.2390
Text: Catwalk women red shoes
--------------------------------------------------
Score: 2.2381
Text: Vans men red old skool shoes
--------------------------------------------------
Score: 2.2366
Text: Cobblerz women red shoes
--------------------------------------------------

Step 9: Visualize the vector database pull

The vector database query returns a list of red shoes based on the embeddings. To verify the results, let's visualize the pulled vectors.

We'll create a function that loops through the retrieved nodes and displays each image along with its metadata in a row for easy inspection.

def display_nodes_with_images_in_row(vector_database_response_nodes, image_folder="data/footwear", img_width=150):
html_content = "<div style='display: flex; flex-wrap: wrap; gap: 20px;'>"

for node in vector_database_response_nodes:
# Retrieve text and product_id from node metadata
text = node.metadata.get('text')
product_id = node.metadata.get('product_id')

# Generate image path based on product_id
image_path = os.path.join(image_folder, f"{product_id}.jpg")

if os.path.exists(image_path):
# Add each text and image in a flex container
html_content += f"""
<div style="text-align: center;">
<p>{text}</p>
<img src='{image_path}' width='{img_width}px' style="border: 1px solid #ddd; padding: 5px;"/>
</div>
"""
else:
# Handle missing images gracefully
html_content += f"""
<div style="text-align: center;">
<p>{text}</p>
<p style='color: red;'>Image not found for product_id {product_id}</p>
</div>
"""

# Close the main div
html_content += "</div>"

# Display the content as HTML
display(HTML(html_content))

Let's visualize the shoes retrieved from the vector database to confirm that the results match the query for "red shoes."

display_nodes_with_images_in_row(results)

When you run this, you'll see a row of shoe images displayed in your Notebook:

http://localhost:3000/notebooks/jupyter_notebook.ipynb

Examine the Shoes

As shown, all the retrieved shoes are red or have red details, confirming that the vector index query works well for focused queries.

Example 2: Query with text + image

Let's try to query our vector database with both a text string "red heels" and this image:

Download a better resolution of the image from my open repo: Download image ↗

Save the image in the same folder as your Jupyter Notebook and add this snippet to a new cell:

image_path = "1082.jpg"
results = retriever.retrieve(QueryBundle(query_str="red heels", image_path=image_path))

# Display results
for item in results:
score = item.score
print(f"Score: {score:.4f}")
print(f"Text: {item.get_content()}")
print("-" * 50)

You should see this output in you Jupyter Notebook:

http://localhost:3000/notebooks/jupyter_notebook.ipynb
Score: 2.2765
Text: Hm women red heels
--------------------------------------------------
Score: 2.2702
Text: Cobblerz women red heels
--------------------------------------------------
Score: 2.2631
Text: Clarks women balti zing red heels
--------------------------------------------------
Score: 2.2596
Text: Hm women red heels
--------------------------------------------------
Score: 2.2576
Text: Portia women red heels
--------------------------------------------------

Let's visualize the red heels from the vector database to confirm that the results match the query.

display_nodes_with_images_in_row(results)

When you run this, you'll see a row of heel images displayed in your Notebook:

http://localhost:3000/notebooks/jupyter_notebook.ipynb

Test without custom embeddings model

Let's define another retriever to demonstrate that we're getting the correct vectors, this time, without a custom embedding model:

# Create a simple retriever
retriever = VectorIndexRetriever(
index=vector_index,
similarity_top_k=5, # Retrieve the top 5 results
vector_store_query_mode="hybrid", # Enable hybrid search
alpha=0.5 # Weighting between semantic and keyword search
)

Let's query again and visualize the vector database pull:

# Query the vector store for "red shoes"
results = retriever.retrieve("red shoes")

# Display results
for item in results:
score = item.score
print(f"Score: {score:.4f}")
print(f"Text: {item.get_content()}")
print("-" * 50)

As seen, we're getting an error since the default embedder is 1536 hence giving us a vector that is a mismatch to AWS Titan's vector length: 1024:

http://localhost:3000/notebooks/jupyter_notebook.ipynb
results = retriever.retrieve("red shoes")
PineconeApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Date': 'Tue, 04 Mar 2025 17:46:52 GMT', 'Content-Type': 'application/json', 'Content-Length': '104', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '90', 'x-pinecone-request-id': '3238657866182661992', 'x-envoy-upstream-service-time': '2', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"Vector dimension 1536 does not match the dimension of the index 1024","details":[]}
Why a 1536 long vector?

That's the default embedder: OpenAI's ADA text-embedding-ada-00:

By default LlamaIndex uses text-embedding-ada-002, which is the default embedding used by OpenAI. If you are using different LLMs you will often want to use different embeddings.

LlamaIndex Vector Store Index ↗

Summary

  • Custom Embedding Model: AWS Titan multimodal embeddings
  • Overriding BaseEmbedding: Enabled text + image embeddings
  • Fixed Vector Dimension Error: Titan uses 1024-dimensional vectors
  • Vector Search with Pinecone: Indexed vectors for hybrid search
  • Visualization: Displayed retrieved product images

More AI Guides


Want to build your own AI agent? 🎯

If you're interested in AI-powered search and recommendation systems, check out my free mini-course:

Build an AI Agent for Multi-Color Product Queries

In this step-by-step course, you'll:

Vectorize product data using AWS Titan
Query a vector database with Pinecone and LlamaIndex
Build an AI agent that filters by multiple colors & attributes
✅ Work entirely in Jupyter Notebook - no deployment needed

Join for free and start building today!