Skip to main content

Vectorize data

Now that we've examined the data, let's proceed to the next step: vectorizing the product data.

All images/dataset used throughout this guide are from: Aggarwal, P. (2022). Fashion Product Images (Small). Available online: https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset

Product columns

In our dataset, we have 4 columns:

Product data columns
Color
Season
Year
Description

Product images

In addition to these columns, we also have product images.

Here's a subset of our jeans collection:

  • Peter England Men Party Blue Jeans
    Peter England Men Party Blue Jeans
  • Jealous 21 Women Black Jeans
    Jealous 21 Women Black Jeans
  • Jealous 21 Women Black Jegging
    Jealous 21 Women Black Jegging
  • Tokyo Talkies Women Navy Slim Fit Jeans
    Tokyo Talkies Women Navy Slim Fit Jeans
  • Locomotive Men Washed Blue Jeans
    Locomotive Men Washed Blue Jeans

Vectorizing data

Different strategies

When vectorizing our product data, you can use three different strategies:

  1. Vectorize product texts
  2. Vectorize product images
  3. Vectorize both product texts and product images

Let's examine how the AI recommendations vary with different vectorization strategies.

1. Vectorizing only product texts

We start with vectorizing only the product texts. This approach can be useful if you either don't have images or if the product images don't contain visual features relevant to customer inquiries.

For instance, here's how the product text vector looks for one pair of jeans:

Product text to vectorizeProduct text vector
Jealous 21 Women Black Jeans[0.021333912387490273, -0.01840313896536827, ....]

After vectorizing all the jeans descriptions, we can run a customer inquiry like "I'm looking for light blue women's jeans". This straightforward inquiry mentions gender and color. Let's see the AI model's performance with just the text vectors.

Product recommendations

Based on this customer inquiry, the AI model provides these jeans recommendations in order:

  • Elle Women Light Blue Jeans
    Elle Women Light Blue Jeans
    Rank: 1
  • ONLY Women Blue Jeans
    ONLY Women Blue Jeans
    Rank: 2
  • ONLY Women Blue Jeans
    ONLY Women Blue Jeans
    Rank: 3
  • Lee Womens Blue Jeans
    Lee Womens Blue Jeans
    Rank: 4
  • Lee Womens Blue Jeans
    Lee Womens Blue Jeans
    Rank: 5

The top 5 recommendations are definitely light blue, but the top 10 and top 15 include some darker blue.

Let's see how the AI model performs using only the images.

2. Vectorizing only product images

Next, we vectorize only the product images.
Here's how the product image vector looks for one pair of jeans:

Product image to vectorizeProduct image vector
Jealous 21 Women Black Jeans[0.0037902002, 0.018807068, -0.0027826785,....]

Product recommendations

Based on the customer inquiry, the AI model now provides these jeans recommendations:

  • Lee Women SS Blue Jeans
    Lee Women SS Blue Jeans
    Rank: 1
  • Jealous 21 Women Washed Light Blue Jeans
    Jealous 21 Women Washed Light Blue Jeans
    Rank: 2
  • Jealous 21 Women Washed Blue Jeans
    Jealous 21 Women Washed Blue Jeans
    Rank: 3
  • Jealous 21 Women Washed Light Blue Jeans
    Jealous 21 Women Washed Light Blue Jeans
    Rank: 4
  • Puma Women Blue Jeans
    Puma Women Blue Jeans
    Rank: 5

Compared to the text vector recommendations, all the jeans recommended by the AI model are in light blue colors.

3. Vectorize both text and images

In this experiment, we'll vectorize both the product text and product images.
Here's how the combined vector looks for one pair of jeans:

Product text to vectorizeProduct image to vectorizeProduct text-image vector
Jealous 21 Women Black Jeans[0.0037902002, 0.018807068, -0.0027826785,....]

Product recommendations

Based on the customer inquiry, the AI model now provides these jeans recommendations:

  • Jealous 21 Women Washed Light Blue Jeans
    Jealous 21 Women Washed Light Blue Jeans
    Rank: 1
  • Jealous 21 Women Washed Light Blue Jeans
    Jealous 21 Women Washed Light Blue Jeans
    Rank: 2
  • Jealous 21 Women Washed Light Blue Jeans
    Jealous 21 Women Washed Light Blue Jeans
    Rank: 3
  • Jealous 21 Women Washed Light Blue Jeans
    Jealous 21 Women Washed Light Blue Jeans
    Rank: 4
  • Lee Womens Blue Jeans
    Lee Womens Blue Jeans
    Rank: 5

Observations

The top 5 recommendations are light blue jeans, followed by a range of darker jeans. Compared to text-only vector recommendations, these recommendations include more light-colored jeans but fewer than the image-only vector recommendations.

Comparisons side by side

Here's a table showing the top 6 product recommendations for the customer inquiry "I'm looking for light blue women's jeans" for comparison:

VectorizationRank 1Rank 2Rank 3Rank 4Rank 5
Text vectorsProduct 57077Product 32560Product 32567Product 50957Product 50954
Image vectorsProduct 9147Product 26995Product 27000Product 27002Product 58561
Text+image vectorsProduct 26995Product 27002Product 27001Product 27023Product 50954

While this approach to vectorizing product data is quite basic, optimizing the product descriptions can further enhance the AI model's product recommendations.

Let's discuss how we can enrich our product descriptions in the next section.