Vectorize product data

The next step is to vectorize the shoe product data:

We need to generate a vector, the numerical representation of each shoe in our shoe database

Cost of vectorization

Vectorizing datasets with AWS Bedrock and the Titan multimodal model involves costs based on the number of input tokens and images:

Text embeddings: $0.0008 per 1,000 input tokens
Image embeddings: $0.00006 per image

The provided SoleMates shoe dataset is fairly small, containing 1306 pair of shoes, making it affordable to vectorize.

For this dataset, I calculated the cost of vectorization and summarized the token counts below:

Token count: 12746 tokens
Images: 1306
Total cost: $0.0885568

Calculations:

(12746 tokens/1000)*0.0008 + (1306*0.00006) =
0.0885568

Pre-embedded dataset

Note

This step is entirely optional and designed to accommodate various levels of access and resources.

If you prefer not to generate embeddings or don't have access to AWS, you can use a pre-embedded dataset included in the repository.

This file contains all the embeddings and token counts, so you can follow this guide without incurring additional costs.

tip

For hands-on experience, I recommend running the embedding process yourself to understand the workflow better.

The pre-embedded dataset is located at:

data/solemates_shoe_directory_with_embeddings_token_count.csv

To load the dataset in your Jupyter Notebook, use the following code:

Jupyter Notebook
df_shoes_with_embeddings = pd.read_csv('data/solemates_shoe_directory_with_embeddings_token_count.csv')

# Convert string representations to actual lists
df_shoes_with_embeddings['titan_embedding'] = df_shoes_with_embeddings['titan_embedding'].apply(ast.literal_eval)

By using this pre-embedded dataset, you can skip the embedding step and continue directly with the rest of the walkthrough.

Note

This step is entirely optional and designed to accommodate various levels of access and resources.

For hands-on experience, I recommend running the embedding process to understand the workflow.

Let's get started with Amazon Bedrock in the next lesson.

Cost of vectorization​

Pre-embedded dataset​

Cost of vectorization

Pre-embedded dataset