Optimize data
In the previous section, we used AI to recommend products based on the straightforward customer inquiry, "I'm looking for light blue women's jeans".
All images/dataset used throughout this guide are from: Aggarwal, P. (2022). Fashion Product Images (Small). Available online: https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset
The results were promising:
Vectorization | Rank 1 | Rank 2 | Rank 3 | Rank 4 | Rank 5 |
---|---|---|---|---|---|
Text vectors | |||||
Image vectors | |||||
Text+image vectors |
But what happens when we try a more complex inquiry like "I'm looking for women's jeans for a summer party"?
Let's first think about what kind of jeans we, as humans, would recommend knowing only this. Perhaps, light colors, lightweight fabrics, and perhaps wide-leg jeans for a flowy feel might come to mind.
With this in mind, let's look at our data again. Here's a subset:
Product Name | Gender | Year | Season |
---|---|---|---|
Peter England Men Party Blue Jeans | Men | 2012 | Summer |
Jealous 21 Women Black Jeans | Women | 2012 | Summer |
Jealous 21 Women Black Jegging | Women | 2011 | Fall |
Tokyo Talkies Women Navy Slim Fit Jeans | Women | 2012 | Summer |
Locomotive Men Washed Blue Jeans | Men | 2011 | Fall |
Enriching data
In our previous example, we only vectorized the product description, but that leaves out both the season
and year
.
If you had to recommend jeans for a summer party, knowing only the color isn't enough. By enriching the product descriptions with additional data, we can provide more context for the AI model.
Here's how we could enrich our product descriptions by adding data from the other columns:
Column | How to enrich vector |
---|---|
Color | Visual matching and a common search attribute |
Season and Year | Provide temporal context, hinting at the style period |
Enrichment example
Here's an example of how we could enrich our original product descriptions with season
and year
:
Before | After |
---|---|
Jealous 21 Women Black Jeans | Fall 2011 casual black jeans for women by Jealous 21, ideal for everyday wear |
Consider a customer inquiry like:
"I'm looking for women's jeans for a summer party"
Suppose the season is not mentioned in the product description that we've vectorized. In that case, you're solely relying on AI to draw its own conclusions about what is considered suitable for a summer party.
Perhaps, the AI model might focus on colors, and recommend jeans in light colors, for instance.
By adding seasons
to the product description, we provide the AI with additional context, reducing the reliance on AI inference alone.
AI product recommendations
Let's test our new customer inquiry on our original vectorized data to get an idea of how the AI model reasoned:
Vectorization | Rank 1 | Rank 2 | Rank 3 | Rank 4 | Rank 5 |
---|---|---|---|---|---|
Text vectors | |||||
Image vectors | |||||
Text+image vectors |
This one is tricker. Let's also look at the season for each recommended pair of jeans:
Comparing seasons of different vectors
This comparison shows that while the text vector recommendations include darker jeans, most of the recommendations are labeled as summer jeans.
Let's look at how we could optimize the product description by enriching it with season
and year
.
Enrich product descriptions
Let's enrich our product descriptions by adding season
and year
to each product description, here's a subset of how our new jeans descriptions will look like:
Before | After |
---|---|
Peter England Men Party Blue Jeans | Summer 2012 Peter England Men Party Blue Jeans |
Jealous 21 Women Black Jeans | Summer 2012 Jealous 21 Women Black Jeans |
Jealous 21 Women Black Jegging | Fall 2011 Jealous 21 Women Black Jegging |
Tokyo Talkies Women Navy Slim Fit Jeans | Summer 2012 Tokyo Talkies Women Navy Slim Fit Jeans |
Locomotive Men Washed Blue Jeans | Fall 2011 Locomotive Men Washed Blue Jeans |
AI recommendations optimized data
Let's look at the AI product recommendations on the optimized data, starting with the text vectors:
Comparing seasons of different vectors
The optimized product data shows that the new product recommendations are all labeled as summer for the text vectors and all but one for the combined text and image vectors.
The pure image vectors remain unchanged since the optimization only affects the product text.
Product recommendations
Let's also look at the actual jeans recommendations, here are the product recommendations visually:
Vectorization | Rank 1 | Rank 2 | Rank 3 | Rank 4 | Rank 5 |
---|---|---|---|---|---|
Text vectors | |||||
Image vectors | |||||
Text+image vectors |
Comparison: original vs. optimized Data
Here's an overview of the original product recommendations compared side-by-side with the recommendations based on the optimized data:
Text vec. original | Text vec. optimized | Image vectors | Text+image vec. original | Text+image vec. ooptimized | |||||
---|---|---|---|---|---|---|---|---|---|
We've seen how naive vectorization of original product descriptions can be improved by enriching descriptions with additional data.
Next, we'll explore another optimization technique: hybrid search
.