Skip to main content

Optimize data

In the previous section, we used AI to recommend products based on the straightforward customer inquiry, "I'm looking for light blue women's jeans".

All images/dataset used throughout this guide are from: Aggarwal, P. (2022). Fashion Product Images (Small). Available online: https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset

The results were promising:

VectorizationRank 1Rank 2Rank 3Rank 4Rank 5
Text vectorsProduct 57077Product 32560Product 32567Product 50957Product 50954
Image vectorsProduct 9147Product 26995Product 27000Product 27002Product 58561
Text+image vectorsProduct 26995Product 27002Product 27001Product 27023Product 50954

But what happens when we try a more complex inquiry like "I'm looking for women's jeans for a summer party"?

Let's first think about what kind of jeans we, as humans, would recommend knowing only this. Perhaps, light colors, lightweight fabrics, and perhaps wide-leg jeans for a flowy feel might come to mind.

With this in mind, let's look at our data again. Here's a subset:

Product NameGenderYearSeason
Peter England Men Party Blue JeansMen2012Summer
Jealous 21 Women Black JeansWomen2012Summer
Jealous 21 Women Black JeggingWomen2011Fall
Tokyo Talkies Women Navy Slim Fit JeansWomen2012Summer
Locomotive Men Washed Blue JeansMen2011Fall

Enriching data

In our previous example, we only vectorized the product description, but that leaves out both the season and year.

If you had to recommend jeans for a summer party, knowing only the color isn't enough. By enriching the product descriptions with additional data, we can provide more context for the AI model.

Here's how we could enrich our product descriptions by adding data from the other columns:

ColumnHow to enrich vector
ColorVisual matching and a common search attribute
Season and YearProvide temporal context, hinting at the style period

Enrichment example

Here's an example of how we could enrich our original product descriptions with season and year:

BeforeAfter
Jealous 21 Women Black JeansFall 2011 casual black jeans for women by Jealous 21, ideal for everyday wear
Reasoning behind adding seasons

Consider a customer inquiry like:

"I'm looking for women's jeans for a summer party"

Suppose the season is not mentioned in the product description that we've vectorized. In that case, you're solely relying on AI to draw its own conclusions about what is considered suitable for a summer party.

Perhaps, the AI model might focus on colors, and recommend jeans in light colors, for instance.

By adding seasons to the product description, we provide the AI with additional context, reducing the reliance on AI inference alone.

AI product recommendations

Let's test our new customer inquiry on our original vectorized data to get an idea of how the AI model reasoned:

VectorizationRank 1Rank 2Rank 3Rank 4Rank 5
Text vectorsProduct 51502Product 32567Product 32560Product 32559Product 51505
Image vectorsProduct 58561Product 11343Product 12331Product 13346Product 9147
Text+image vectorsProduct 50954Product 50960Product 50950Product 50957Product 27926

This one is tricker. Let's also look at the season for each recommended pair of jeans:

Comparing seasons of different vectors
Choose between different vectorization strategies

This comparison shows that while the text vector recommendations include darker jeans, most of the recommendations are labeled as summer jeans.

Let's look at how we could optimize the product description by enriching it with season and year.

Enrich product descriptions

Let's enrich our product descriptions by adding season and year to each product description, here's a subset of how our new jeans descriptions will look like:

BeforeAfter
Peter England Men Party Blue JeansSummer 2012 Peter England Men Party Blue Jeans
Jealous 21 Women Black JeansSummer 2012 Jealous 21 Women Black Jeans
Jealous 21 Women Black JeggingFall 2011 Jealous 21 Women Black Jegging
Tokyo Talkies Women Navy Slim Fit JeansSummer 2012 Tokyo Talkies Women Navy Slim Fit Jeans
Locomotive Men Washed Blue JeansFall 2011 Locomotive Men Washed Blue Jeans

AI recommendations optimized data

Let's look at the AI product recommendations on the optimized data, starting with the text vectors:

Comparing seasons of different vectors
Choose between different vectorization strategies

The optimized product data shows that the new product recommendations are all labeled as summer for the text vectors and all but one for the combined text and image vectors.

Pure image vector recommendations stay the same

The pure image vectors remain unchanged since the optimization only affects the product text.

Product recommendations

Let's also look at the actual jeans recommendations, here are the product recommendations visually:

VectorizationRank 1Rank 2Rank 3Rank 4Rank 5
Text vectorsProduct 32561Product 32567Product 32560Product 32559Product 51500
Image vectorsProduct 58561Product 11343Product 12331Product 13346Product 9147
Text+image vectorsProduct 32561Product 9146Product 32567Product 32560Product 50945

Comparison: original vs. optimized Data

Here's an overview of the original product recommendations compared side-by-side with the recommendations based on the optimized data:

Text vec. originalText vec. optimizedImage vectorsText+image vec. originalText+image vec. ooptimized
Denizen Women Greenish Blue JeansRank: 1ONLY Women Peach JeansRank: 1Puma Women Blue JeansRank: 1Lee Womens Blue JeansRank: 1ONLY Women Peach JeansRank: 1
ONLY Women Blue JeansRank: 2ONLY Women Blue JeansRank: 2Lee Men Blue Chicago Fit JeansRank: 2Lee Womens Blue JeansRank: 2Lee Women Mid Stone Blue Maxi Fit JeansRank: 2
ONLY Women Blue JeansRank: 3ONLY Women Blue JeansRank: 3Spykar Women Ep Jeans Blue JeansRank: 3Lee Womens Blue JeansRank: 3ONLY Women Blue JeansRank: 3
ONLY Women Black JeansRank: 4ONLY Women Black JeansRank: 4Spykar Women Washed Blue JeansRank: 4Lee Womens Blue JeansRank: 4ONLY Women Blue JeansRank: 4
Denizen Women Blue JeansRank: 5Denizen Women Black JeansRank: 5Lee Women SS Blue JeansRank: 5Scullers For Her Women Blue JeansRank: 5Lee Womens Blue Maxi Fit JeansRank: 5

We've seen how naive vectorization of original product descriptions can be improved by enriching descriptions with additional data.

Next, we'll explore another optimization technique: hybrid search.