Skip to main content

Load shoe data

Let's start by reading the SoleMates shoe dataset.

We'll transform this data into embeddings and later store everything in the cloud-based Pinecone vector database.

Go ahead and load the dataset:

Jupyter Notebook
# Load the SoleMates shoe dataset
df_shoes = pd.read_csv('data/solemates_shoe_directory.csv')

# Convert 'color_details' from string representation of a list to an actual list
df_shoes['color_details'] = df_shoes['color_details'].apply(ast.literal_eval)

# Display the first few rows of the dataset
df_shoes.head()

You'll see this table in your Notebook:

http://localhost:3000/notebooks/jupyter_notebook.ipynb
product_titlegenderproduct_typecolorusagecolor_detailsheel_heightheel_typeprice_usdbrandproduct_idimage
Puma men future cat remix sf black casual shoesmencasual shoesblackcasual[]nannan220puma11.jpg
Buckaroo men flores black formal shoesmenformal shoesblackformal[]nannan155buckaroo22.jpg
Gas men europa white shoesmencasual shoeswhitecasual[]nannan105gas33.jpg
Nike men's incinerate msl white blue shoemensports shoeswhitesports['blue']nannan125nike44.jpg
Clarks men hang work leather black formal shoesmenformal shoesblackformal[]nannan220clarks55.jpg

The shoe product data consists of 12 columns:

  • product_title
  • gender
  • product_type
  • color
  • usage
  • color_details
  • heel_height
  • heel_type
  • price_usd
  • brand
  • product_id
  • image

Let's also have a look at the first 5 shoes:

Jupyter Notebook
width = 100
images_html = ""
image_data_path = 'data/footwear'
for img_file in df_shoes.head()['image']:
img_path = os.path.join(image_data_path, img_file)
# Add each image as an HTML <img> tag
images_html += f'<img src="{img_path}" style="width:{width}px; margin-right:10px;">'
# Display all images in a row using HTML
display(HTML(f'<div style="display: flex; align-items: center;">{images_html}</div>'))

Run this cell and the cell output should look something like this:

http://localhost:3000/notebooks/jupyter_notebook.ipynb

Let's vectorize out shoe product data in the next lesson.