Skip to main content

Calculate cosine similarity

Next, prepare to compare the customer query embedding with our product embeddings. Start by gathering the embeddings from your product catalog into a list:

Jupyter Notebook
# Extracting only the image vectors from the DataFrame for comparison
vectors = list(df['image_embedding'])

Then, calculate the cosine similarity between the customer's query embedding and each product's image vector using sklearn's cosine_similarity function:

Jupyter Notebook
# Calculate cosine similarity between the query embedding and the image vectors
cosine_scores = cosine_similarity([query_embedding], vectors)[0]

cosine_scores is now a list of the scores of how similar each bag is to the customer query Hi! I'm looking for a red bag:

Cosine scores array

This process produces a list of scores indicating the similarity between the customer's query and each product.

To link these scores with the corresponding products, create a Pandas series mapping scores to product images:

# Create a series with these scores and the corresponding IDs or Image names
score_series = pd.Series(cosine_scores, index=df['image'])

Finally, let's sort the product scores in descending order, starting with the most fitting product suggestion for the customer query Hi! I'm looking for a red bag:

# Sort the scores in descending order
sorted_scores = score_series.sort_values(ascending=False)
sorted_scores

The sorted_scores should look something like this: Sorted scores

Now that we have the similarity scores, let's write a code snippet to display the products in the next section.