Cosine Similarity in Vector Space
When we turn customer questions and product descriptions into vectors, we move into a space where we can directly compare and measure how similar items are.
But how exactly do we compare these vectors and get the right product?
This is where we'll use cosine similarity.
Understanding Cosine Similarity
Cosine similarity measures how close two vectors are by calculating the cosine of the angle between them in a multi-dimensional space.
This technique checks the cosine of the angle connecting two points; if the vectors match, the angle is 0 degrees, making the cosine value 1, which means they're exactly alike.
For example, if we match the vector for a customer's request for "red yoga pants" against the vector for red yoga pants, the small angle suggests they're very similar, giving a cosine score near 1.
On the other hand, if we compare it with a vector for yellow yoga pants, the bigger angle means they're not as similar, pushing the cosine score towards -1.
Simplifying Search with Cosine Similarity
Using cosine similarity, we can compare a customer's search query in vector form against all product vectors in our database to rank products. The closer a product's vector is to the query vector, indicated by a higher cosine score, the higher it ranks as a match to the customer's needs.
Example: red yoga pants
Let's say a customer searches for red yoga pants. Cosine similarity helps our search algorithm prioritize red yoga pants by recognizing their vector is closer (has a higher cosine similarity score) to the customer's query vector:
In contrast, a less similar product, like yellow yoga pants, has a wider angle to the query vector, resulting in a lower similarity score:
Cosine similarity for e-commerce
This is why embeddings and cosine similarity are game-changers for semantic search. They allow our application to grasp the context and nuances of customer questions, delivering semantically relevant products beyond just keyword matches:
Let's go back to our product catalog of bags and create embeddings for each product in the next section, so we can match them with customer inquiries.