Skip to main content

Cosine Similarity in Vector Space

When we turn customer questions and product descriptions into vectors, we move into a space where we can directly compare and measure how similar items are.

But how exactly do we compare these vectors and get the right product?

How exactly do we compare these vectors?

How exactly do we compare these vectors?

This is where we'll use cosine similarity.

Understanding Cosine Similarity

Cosine similarity measures how close two vectors are by calculating the cosine of the angle between them in a multi-dimensional space.

Cosine determines how similar two vectors are

Cosine determines how similar two vectors are

This technique checks the cosine of the angle connecting two points; if the vectors match, the angle is 0 degrees, making the cosine value 1, which means they're exactly alike.

For example, if we match the vector for a customer's request for "red yoga pants" against the vector for red yoga pants, the small angle suggests they're very similar, giving a cosine score near 1.

On the other hand, if we compare it with a vector for yellow yoga pants, the bigger angle means they're not as similar, pushing the cosine score towards -1.

Simplifying Search with Cosine Similarity

Using cosine similarity, we can compare a customer's search query in vector form against all product vectors in our database to rank products. The closer a product's vector is to the query vector, indicated by a higher cosine score, the higher it ranks as a match to the customer's needs.

Example: red yoga pants

Let's say a customer searches for red yoga pants. Cosine similarity helps our search algorithm prioritize red yoga pants by recognizing their vector is closer (has a higher cosine similarity score) to the customer's query vector:

High Cosine Similarity: Matching Customer Queries

High Cosine Similarity: Matching Customer Queries

In contrast, a less similar product, like yellow yoga pants, has a wider angle to the query vector, resulting in a lower similarity score:

Large Angle, Low Similarity: Divergent Product Match

Large Angle, Low Similarity: Divergent Product Match

Cosine similarity for e-commerce

This is why embeddings and cosine similarity are game-changers for semantic search. They allow our application to grasp the context and nuances of customer questions, delivering semantically relevant products beyond just keyword matches:

High Cosine Similarity: Matching Customer Queries

High Cosine Similarity: Matching Customer Queries

Let's go back to our product catalog of bags and create embeddings for each product in the next section, so we can match them with customer inquiries.