Vectorize data
Now that we've examined the data, let's proceed to the next step: vectorizing the product data.
All images/dataset used throughout this guide are from: Aggarwal, P. (2022). Fashion Product Images (Small). Available online: https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset
Product columns
In our dataset, we have 4 columns:
Product data columns |
---|
Color |
Season |
Year |
Description |
Product images
In addition to these columns, we also have product images.
Here's a subset of our jeans collection:
- Peter England Men Party Blue Jeans
- Jealous 21 Women Black Jeans
- Jealous 21 Women Black Jegging
- Tokyo Talkies Women Navy Slim Fit Jeans
- Locomotive Men Washed Blue Jeans
Vectorizing data
Different strategies
When vectorizing our product data, you can use three different strategies:
- Vectorize product texts
- Vectorize product images
- Vectorize both product texts and product images
Let's examine how the AI recommendations vary with different vectorization strategies.
1. Vectorizing only product texts
We start with vectorizing only the product texts. This approach can be useful if you either don't have images or if the product images don't contain visual features relevant to customer inquiries.
For instance, here's how the product text vector looks for one pair of jeans:
Product text to vectorize | Product text vector |
---|---|
Jealous 21 Women Black Jeans | [0.021333912387490273, -0.01840313896536827, ....] |
After vectorizing all the jeans descriptions, we can run a customer inquiry like "I'm looking for light blue women's jeans". This straightforward inquiry mentions gender and color. Let's see the AI model's performance with just the text vectors.
Product recommendations
Based on this customer inquiry, the AI model provides these jeans recommendations in order:
- Elle Women Light Blue JeansRank: 1
- ONLY Women Blue JeansRank: 2
- ONLY Women Blue JeansRank: 3
- Lee Womens Blue JeansRank: 4
- Lee Womens Blue JeansRank: 5
The top 5 recommendations are definitely light blue, but the top 10 and top 15 include some darker blue.
Let's see how the AI model performs using only the images.
2. Vectorizing only product images
Next, we vectorize only the product images.
Here's how the product image vector looks for one pair of jeans:
Product image to vectorize | Product image vector |
---|---|
[0.0037902002, 0.018807068, -0.0027826785,....] |
Product recommendations
Based on the customer inquiry, the AI model now provides these jeans recommendations:
- Lee Women SS Blue JeansRank: 1
- Jealous 21 Women Washed Light Blue JeansRank: 2
- Jealous 21 Women Washed Blue JeansRank: 3
- Jealous 21 Women Washed Light Blue JeansRank: 4
- Puma Women Blue JeansRank: 5
Compared to the text vector recommendations, all the jeans recommended by the AI model are in light blue colors.
3. Vectorize both text and images
In this experiment, we'll vectorize both the product text and product images.
Here's how the combined vector looks for one pair of jeans:
Product text to vectorize | Product image to vectorize | Product text-image vector |
---|---|---|
Jealous 21 Women Black Jeans | [0.0037902002, 0.018807068, -0.0027826785,....] |
Product recommendations
Based on the customer inquiry, the AI model now provides these jeans recommendations:
- Jealous 21 Women Washed Light Blue JeansRank: 1
- Jealous 21 Women Washed Light Blue JeansRank: 2
- Jealous 21 Women Washed Light Blue JeansRank: 3
- Jealous 21 Women Washed Light Blue JeansRank: 4
- Lee Womens Blue JeansRank: 5
Observations
The top 5 recommendations are light blue jeans, followed by a range of darker jeans. Compared to text-only
vector recommendations, these recommendations include more light-colored jeans but fewer than the image-only
vector recommendations.
Comparisons side by side
Here's a table showing the top 6 product recommendations for the customer inquiry "I'm looking for light blue women's jeans" for comparison:
Vectorization | Rank 1 | Rank 2 | Rank 3 | Rank 4 | Rank 5 |
---|---|---|---|---|---|
Text vectors | |||||
Image vectors | |||||
Text+image vectors |
While this approach to vectorizing product data is quite basic, optimizing the product descriptions can further enhance the AI model's product recommendations.
Let's discuss how we can enrich our product descriptions in the next section.