Skip to main content

Vectorizing data

Before we dive deeper into enhancing our online jeans store, it's crucial to understand why and how we vectorize product data.

All images/dataset used throughout this guide are from: Aggarwal, P. (2022). Fashion Product Images (Small). Available online: https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset

Vectorization, or creating embeddings, is the process of converting product data, like product descriptions and images, into a format that AI models can understand and process. This involves transforming the data into vectors:

Jeans vectors

Embeddings are mathematical representations of data

Why Vectorize Jeans Data?

In any online store, the variety and specifics of products like jeans - different colors, styles, and materials - need to be searchable in a way that matches customer inquiries with the most relevant products:

Embedded customer inquiry

The customer inquiry needs to be vectorized to be matched with the most relevant products

Traditionally, systems might rely on simple keyword matches, which can miss nuances in customer preferences or product descriptions.

Vectorizing Jeans

The illustration below shows how an embedding model converts items like jeans and a customer inquiry into numerical form. Each item is represented by a dense 300-dimensional vector, a compact array of real numbers, where each element encodes some aspect of the item's characteristics:

Jeans vectors

Mathematical representations of jeans and a customer inquiry

Visualizing Jeans in Vector Space

In our vector space, each point represents a unique pair of jeans, and their proximity to one another is based on similar characteristics such as fit, color, and style.

In the illustration, you can see clusters of jeans:

Clusters of jeans in vector space

Clusters of jeans in vector space

Light blue jeans form one cluster, indicating their similarity to each other, while being distinct from clusters of dark blue and grey jeans. The vector space also reveals the relationships between different styles - notice how slim-fit jeans are positioned relative to boot-cut ones, reflecting their shared attributes and differences.

This organized layout in vector space is not just a theoretical concept; it's a practical tool that our AI model uses to identify and recommend products. When a customer searches for light blue slim-fit jeans, the system can easily locate this cluster and suggest closely related options:

Product recommendations mapped in vector space

Product recommendations mapped in vector space

It can also show alternatives from nearby clusters, perhaps a pair of grey slim-fit jeans that the customer may also like, thus broadening their choices without straying too far from their original intent:

Exploring related options in vector space clusters

Exploring related options in vector space clusters

The Limitations of Keyword Matching

Keyword matches can fail or underperform in several scenarios:

The limitations of keyword matching

The limitations of keyword matching

Relying only on keyword matches can lead to poor customer experiences. Here's why:

1. Typos

Even a small typo can derail a search. When "genes" is typed instead of "jeans," a keyword match might return irrelevant products or no results at all.

2. Synonyms

Different words for the same item, like "denims" for "jeans," might not be recognized by a strict keyword match system, narrowing the search results.

3. Context

A color search for "navy jeans" could be misinterpreted as a military uniform if the system doesn't understand "navy" as a color in this context.

4. Slang

Fashion terms evolve, and what's known as "skinnies" might not be matched if the system only knows "skinny jeans."

5. Descriptive Searches

A request for "jeans comfortable for a long flight" aims for a specific use-case which keyword search isn't nuanced enough to understand.

Keywords like "80s retro jeans" imply a style that might be lost on a simple search algorithm not tuned to fashion trends.

Each of these examples shows the common problems of basic keyword-matching systems. They highlight the need for a smarter approach that can understand and process the nuances of human language and intent in retail.

Overcoming keyword limitations with vectorization

By vectorizing data, we can overcome these limitations. Vectors allow us to create multi-dimensional spaces where products are not just isolated keywords but points in relation to others, capturing the subtleties of meaning, use cases, and customer inquiries.

This is why our online jeans store will use the power of vectorization for an improved and intuitive shopping experience.

Let's vectorize our jeans in the next section.