Load shoe data
Let's start by reading the SoleMates shoe dataset.
We'll transform this data into embeddings and later store everything in the cloud-based Pinecone vector database.
Go ahead and load the dataset:
Jupyter Notebook
# Load the SoleMates shoe dataset
df_shoes = pd.read_csv('data/solemates_shoe_directory.csv')
# Convert 'color_details' from string representation of a list to an actual list
df_shoes['color_details'] = df_shoes['color_details'].apply(ast.literal_eval)
# Display the first few rows of the dataset
df_shoes.head()
You'll see this table in your Notebook:
http://localhost:3000/notebooks/jupyter_notebook.ipynb
product_title | gender | product_type | color | usage | color_details | heel_height | heel_type | price_usd | brand | product_id | image |
---|---|---|---|---|---|---|---|---|---|---|---|
Puma men future cat remix sf black casual shoes | men | casual shoes | black | casual | [] | nan | nan | 220 | puma | 1 | 1.jpg |
Buckaroo men flores black formal shoes | men | formal shoes | black | formal | [] | nan | nan | 155 | buckaroo | 2 | 2.jpg |
Gas men europa white shoes | men | casual shoes | white | casual | [] | nan | nan | 105 | gas | 3 | 3.jpg |
Nike men's incinerate msl white blue shoe | men | sports shoes | white | sports | ['blue'] | nan | nan | 125 | nike | 4 | 4.jpg |
Clarks men hang work leather black formal shoes | men | formal shoes | black | formal | [] | nan | nan | 220 | clarks | 5 | 5.jpg |
The shoe product data consists of 12 columns:
- product_title
- gender
- product_type
- color
- usage
- color_details
- heel_height
- heel_type
- price_usd
- brand
- product_id
- image
Let's also have a look at the first 5 shoes:
Jupyter Notebook
width = 100
images_html = ""
image_data_path = 'data/footwear'
for img_file in df_shoes.head()['image']:
img_path = os.path.join(image_data_path, img_file)
# Add each image as an HTML <img> tag
images_html += f'<img src="{img_path}" style="width:{width}px; margin-right:10px;">'
# Display all images in a row using HTML
display(HTML(f'<div style="display: flex; align-items: center;">{images_html}</div>'))
Run this cell and the cell output should look something like this:
http://localhost:3000/notebooks/jupyter_notebook.ipynb
![](/img/course/solemates/footwear/1.jpg)
![](/img/course/solemates/footwear/2.jpg)
![](/img/course/solemates/footwear/3.jpg)
![](/img/course/solemates/footwear/4.jpg)
![](/img/course/solemates/footwear/5.jpg)
Let's vectorize out shoe product data in the next lesson.