Skip to main content

How to use ChatGPT API to build a chatbot for product recommendations with embeddings

Β· 17 min read
Norah Sakal
AI Consultant

Cover image

Are you looking to build a chatbot that can recommend products to your customers based on their unique profiles? Here's a step-by-step guide that shows you how to build a chatbot using embeddings to match a user's profile with relevant products from a company's database.

You'll get the tools you need to create a customer-facing chatbot that can boost engagement and drive sales.

In this walkthrough, we'll use a beauty e-commerce company as an example, but the principles can be applied to any industry.

Need tailored AI solutions? I provide one-on-one collaboration and custom AI services for businesses.

Let's find the perfect solution for your challenges: consulting services


Here's what we'll use:

1. OpenAI API πŸ€–β€‹

2. Python πŸβ€‹


Here are the steps:

1. Introduction to embeddings​

2. Get OpenAI API keys​

3. Create a product dataset​

4. Create embeddings for product dataset​

5. Create a customer profile dataset​

6. Create embeddings for customer profile dataset​

7. Create embeddings for customer chat message​

8. Get previous purchase data similarities​

9. Get product database similarities​

10. Create ChatGPT API prompt​

11. Create ChatGPT product recommendations​


1. Introduction to embeddings​

What are embeddings?

In natural language processing (NLP), an embedding represents words, phrases, or even entire documents as dense vectors of numerical values. These vectors are typically high-dimensional, with hundreds or even thousands of dimensions, and are designed to capture the semantic and syntactic relationships between different pieces of text data.

Embeddings are often created using neural networks trained on large amounts of text data. During training, the neural network learns to map each word to a dense vector so that words with similar meanings or are used in similar contexts are mapped to similar vectors.

For example, the words "car" and "vehicle" might be mapped to vectors that are very close together, while the word "banana" might be mapped to a vector that is further away.

In our case, we generate embeddings for each user profile and each product in the database, which we will then use to calculate their similarity and find the best product matches for each user.

OpenAI endpoints for embeddings

The OpenAI API has two different endpoints for working with embeddings:

– search – similarity

The choice of which endpoint to use depends on your use case and the task you're trying to accomplish.

Use the search endpoint if you're trying to find documents or snippets that are similar to your input text. This endpoint returns a list of search results, where each result includes the following:

– document id – score – text of the matched document

The score measures the similarity between your input text and the matched document. A higher score means greater similarity.

The other endpoint for embeddings is similarity. Use this endpoint when you're trying to measure the similarity between two snippets of text or documents. This endpoint returns a single score between 0 and 1, which indicates the similarity between the two input texts.

A score of 0 indicates that the texts are completely dissimilar, while 1 means that the texts are identical.

In summary, use the search endpoint when you want to find similar documents to a given input text, and use the similarity endpoint when you want to measure the similarity between two snippets of text or document.

For this guide, I'll go with similarity since we're looking for products that would fit a customer based on their customer profile.

Now that we know the difference between the endpoints, let's get our OpenAI API keys.


2. Get OpenAI API keys​

Before we go ahead and start coding, let's get the OpenAI credentials needed for the API calls.

Go to https://beta.openai.com/, log in and click on your avatar and View API keys:

Open AI API keys

Then create a new secret key and save it for the request:

Create OpenAI API key

Now we have all the credentials needed to make an API request.


3. Create a product dataset​

The next step is to create a product dataset. Start by importing openai, pandas, and openai embeddings. We'll be using Pandas when we're working with the data in DataFrames:

from openai import OpenAI
from openai.embeddings_utils import get_embedding, cosine_similarity
import pandas as pd

Then go ahead and add your API key and then initialize the OpenAI API using your API key. This will allow you to authenticate and access the OpenAI API using the API key we got in the previous step:

api_key ="YOUR_API_KEY"

# Initialize OpenAI
openai_client = OpenAI(
api_key = api_key
)

Here's the made-up data I'm using:

product_data = [{
"prod_id": 1,
"prod": "moisturizer",
"brand":"Aveeno",
"description": "for dry skin"
},
{
"prod_id": 2,
"prod": "foundation",
"brand":"Maybelline",
"description": "medium coverage"
},
{
"prod_id": 3,
"prod": "moisturizer",
"brand":"CeraVe",
"description": "for dry skin"
},
{
"prod_id": 4,
"prod": "nail polish",
"brand":"OPI",
"description": "raspberry red"
},
{
"prod_id": 5,
"prod": "concealer",
"brand":"Chanel",
"description": "medium coverage"
},
{
"prod_id": 6,
"prod": "moisturizer",
"brand":"Ole Henkrisen",
"description": "for oily skin"
},
{
"prod_id": 7,
"prod": "moisturizer",
"brand":"CeraVe",
"description": "for normal to dry skin"
},
{
"prod_id": 8,
"prod": "moisturizer",
"brand":"First Aid Beauty",
"description": "for dry skin"
},{
"prod_id": 9,
"prod": "makeup sponge",
"brand":"Sephora",
"description": "super-soft, exclusive, latex-free foam"
}]

The brands are real, but the data is all made-up.

Let's add this product data to a Pandas DataFrame:

product_data_df = pd.DataFrame(product_data)
product_data_df

The product DataFrame should look something like this:

Made-up product data

Let's also create a new column called combined for the embeddings later, concatenate the brand, product, and description into the new column combined:

product_data_df['combined'] = product_data_df.apply(lambda row: f"{row['brand']}, {row['prod']}, {row['description']}", axis=1)
product_data_df

The product data DataFrame should now have a new column with the combined data:

Column with combined data

We have the product data ready, let's create embeddings for the new column in the next section.


4. Create embeddings for the product dataset​

The next step is to create embeddings for the combined column we just created. We'll use get_embedding from OpenAI:

get_embedding is a text embedding service provided by OpenAI that generates high-quality vector representations of input text.

The embeddings are generated using a neural network trained on a large corpus of text data and are designed to capture the semantic meaning of the input text.

In our case, we will use get_embedding to generate embeddings for each user profile and each product in the database. Which we'll then use to calculate their similarity and find the best product matches as suggestions for the user.

This will allow us to represent the product data in the database as vectors in a high-dimensional space, making it easier to calculate their similarity with the user input in the chat later and find the best matches:

product_data_df['text_embedding'] = product_data_df.combined.apply(lambda x: get_embedding(x, engine='text-embedding-ada-002'))
product_data_df

We'll be using the embedding model text-embedding-ada-002, which is OpenAI's second-generation embedding model: https://openai.com/blog/new-and-improved-embedding-model

⚠️ This step can take several minutes depending on the data amount

This step can take several minutes depending on the data amount, once finished, you'll have a new column with the numerical representation of the combined column:

Column embeddings

The product data embeddings are all set, let's start with the customer user data in the next section.


5. Create a customer profile dataset​

Now that we have the product data embeddings let's create the customer profile data.

Ideally, this dataset would be past orders or products the customer has previously shown interest in. Or any other customer-specific data you have available.

For this guide, we'll create a made-up order history for a customer. Start by creating a list of the 5 latest beauty products the customer purchased:

customer_order_data = [
{
"prod_id": 1,
"prod": "moisturizer",
"brand":"Aveeno",
"description": "for dry skin"
},{
"prod_id": 2,
"prod": "foundation",
"brand":"Maybelline",
"description": "medium coverage"
},{
"prod_id": 4,
"prod": "nail polish",
"brand":"OPI",
"description": "raspberry red"
},{
"prod_id": 5,
"prod": "concealer",
"brand":"Chanel",
"description": "medium coverage"
},{
"prod_id": 9,
"prod": "makeup sponge",
"brand":"Sephora",
"description": "super-soft, exclusive, latex-free foam"
}]

Then add this customer order data to a Pandas DataFrame:

customer_order_df = pd.DataFrame(customer_order_data)
customer_order_df

You should now have this DataFrame:

Customer order history

Next, let's create a new column for combined purchased product data, just like we did for the product DataFrame:

customer_order_df['combined'] = customer_order_df.apply(lambda row: f"{row['brand']}, {row['prod']}, {row['description']}", axis=1)
customer_order_df

Your DataFrame with previous purchases data should now look like this:

Customer order history combined

Let's head over to the next section, where we'll create embeddings for the customer profile data.


6. Create embeddings for customer profile dataset​

Let's also create embedding for the previous purchases the customer has made. We'll use get_embedding the same way as before:

customer_order_df['text_embedding'] = customer_order_df.combined.apply(lambda x: get_embedding(x, engine='text-embedding-ada-002'))
customer_order_df

Your DataFrame should now have a column for the text embeddings of column Combined:

Customer order history combined embeddings

We have one last text that needs embeddings; the customer input chat message. Let's do that in the next section.


7. Create embeddings for customer chat message​

We have all the data prepared and ready to work with a user question input.

Let's pretend that the customer starts a conversation with your chatbot and asks, "Hi! Can you recommend a good moisturizer for me?"

Start by adding the message as a new customer_input:

customer_input = "Hi! Can you recommend a good moisturizer for me?"

We need to create embeddings for the customer input just like we did for the product data.

Use openai_client.embeddings.create and make sure to use the same model as for the products:

response = openai_client.embeddings.create(
input=customer_input,
model="text-embedding-ada-002"
)
embeddings_customer_question = response.data[0].embedding

You now have a numerical representation of the user input, and we can go ahead and find product recommendations for the customer in the next step.


8. Get previous purchase data similarities​

In this next step, we'll compare the user chat input embeddings with the previous product purchases database embeddings we created earlier.

We'll use the endpoint search since we want to find similarities between the user input question: Hi! Can you recommend a good moisturizer for me? with all their previous purchases.

Create a new column in the previous purchase product data DataFrame for the search score and call cosine_similarity for each embedding.

Next, sort the DataFrame in descending order based on the highest score.

customer_order_df['search_purchase_history'] = customer_order_df.text_embedding.apply(lambda x: cosine_similarity(x, embeddings_customer_question))
customer_order_df = customer_order_df.sort_values('search_purchase_history', ascending=False)
customer_order_df

The previous purchases DataFrame will now have a new column search_purchase_history which is the similarity score between the user question Hi! Can you recommend a good moisturizer for me? and each of the previously purchased products:

Customer order history and question similarities

We can see that the highest score, indicating high similarity, is between the user input question and the Aveeno moisturizer.

Great, we have the similarity scores for the previously purchased products. Let's make the same comparison but for all the products in our database in the next section.


9. Get product database similarities​

Let's make the same comparison between the user input question and all the products in our product database, and sort the results in descending order based on the highest score:

product_data_df['search_products'] = product_data_df.text_embedding.apply(lambda x: cosine_similarity(x, embeddings_customer_question))
product_data_df = product_data_df.sort_values('search_products', ascending=False)
product_data_df

Your products DataFrame should now have a new column search_products, which is the similarity score between the user input question and each product in your database:

Products and question similarities

The highest similarity scores are for the moisturizers from CeraVe and Aveeno. The Aveeno moisturizer happens to be one of the products the customer also previously bought.

Before constructing the ChatGPT API prompt in the next step, let's create two new DataFrames with only the top 3 similarity scores.

One new DataFrame for the previously bought products with the highest similarity scores:

top_3_purchases_df = customer_order_df.head(3)
top_3_purchases_df

Top 3 previously purchased product

Let's also create a new DataFrame for the top 3 similarity scores of all the products on our database:

top_3_products_df = product_data_df.head(3)
top_3_products_df

Top 3 products in the database


10. Create ChatGPT API prompt​

The next step is to create the message objects needed as input for the ChatGPT completion function.

The ChatGPT prompts in this guide are just suggestions. You can construct the prompt in any way you want, as long as you follow the temple and have a dict with a role and message content.

From the Chat completion documentation:

"The main input is the messages parameter. Messages must be an array of message objects, where each object has a role (either β€œsystem”, β€œuser”, or β€œassistant”) and content (the content of the message). Conversations can be as short as 1 message or fill many pages." https://platform.openai.com/docs/guides/chat/introduction

Start with an empty list:

message_objects = []

Then append the first message, which is the system message. The system message helps set the behavior of the assistant:

message_objects.append({"role":"system", "content":"You're a chatbot helping customers with beauty-related questions and helping them with product recommendations"})

Here's an important note in the OpenAI API docs:

From OpenAI API docs: https://platform.openai.com/docs/guides/chat/introduction

"gpt-3.5-turbo-0301 does not always pay strong attention to system messages. Future models will be trained to pay stronger attention to system messages."

After appending the system message, let's add the input message from the customer:

message_objects.append({"role":"user", "content": customer_input})

Then, let's go ahead and create a string of the previous purchases from our top 3 purchases DataFrame:

prev_purchases = ". ".join([f"{row['combined']}" for index, row in top_3_purchases_df.iterrows()])
prev_purchases

Add those purchases to a user message and append it to the array of message objects:

message_objects.append({"role":"user", "content": f"Here're my latest product orders: {prev_purchases}"})

Let's also add some additional instructions to help set the assistant's behavior.

I'm using these instructions to get a friendly reply from the model:

message_objects.append({"role":"user", "content": f"Please give me a detailed explanation of your recommendations"})
message_objects.append({"role":"user", "content": "Please be friendly and talk to me like a person, don't just give me a list of recommendations"})

Tip πŸ’‘

Tinker with the instructions in the prompt until you find the desired voice of your chatbot.

After this set of user instructions, I'm adding this assistant content to help give the model an example of desired behavior:

message_objects.append({"role": "assistant", "content": f"I found these 3 products I would recommend"})

I'll also go ahead and create a list of the top 3 products we have in our product DataFrame:

products_list = []

for index, row in top_3_products_df.iterrows():
brand_dict = {'role': "assistant", "content": f"{row['combined']}"}
products_list.append(brand_dict)
products_list

Top 3 product list

And then add those to our list of message objects with extend:

message_objects.extend(products_list)

Finally, I'll end the prompt with a last instruction:

message_objects.append({"role": "assistant", "content":"Here's my summarized recommendation of products, and why it would suit you:"})

Here's what the list of message objects looks like:

Final message objects

We have the final prompt. Let's call the ChatGPT API in the next step and see what message our customer will receive.


11. Create ChatGPT product recommendations​

The final step is to call the openai_client.chat.completions.create function with our finalized list of message objects:

completion = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=message_objects
)

print(completion.choices[0].message.content)

The model's reply can be extracted with response.choices[0].message,content and should look something like this:

ChatGPT response

This will give us the AI-generated response to our customer input question based on the previous beauty product purchase history and the product database we provided.

We're all set; this is how easy it is to leverage the power of ChatGPT to create conversational AI applications.

With just a few lines of code, we can build a simple chatbot service that can understand natural language and provide product recommendations from user questions.


Summary​

Here's a summary of what we did

1. Introduction to embeddings​

2. Obtained OpenAI API keys​

3. Created a product dataset​

4. Created embeddings for the product dataset using get_embedding​

5. Created a customer profile dataset​

6. Created embeddings for the customer profile dataset using get_embedding​

7. Created embeddings for the customer chat message using get_embedding​

8. Calculated similarities between the customer's previous purchases and the customer's chat input question using cosine similarity​

9. Calculated similarities between the customer's chat input question and the products in the database using cosine similarity​

10. Created a ChatGPT API prompt to initiate the chatbot conversation​

11. Generated product recommendations using the ChatGPT API​


Improvements​

While this guide provides a solid foundation for building a chatbot that recommends products based on a user's profile and available product data, several areas can be improved for more accurate and relevant recommendations.

Product data In this guide, we used minimal data set with basic product and customer information, so the generated product information in the ChatGPT API response is made-up. It's recommended to use an extensive dataset with detailed product information.

Increase the number of previously purchased products This walkthrough only uses the top three previously purchased products to make recommendations. To broaden the scope of product suggestions, it would be beneficial to use a larger set of previously purchased products.

Extract the product type from the user input Extracting the product type from the user input question can help the model make more accurate recommendations by including the relevant product types in the suggestions.


Troubleshooting​

attributeError​

attributeError: module 'openai' has no attribute 'ChatCompletion'​

This probably means that the version of your Python client library for the OpenAI API is lower than 0.27.0.

Run pip install openai --upgrade in your terminal for the latest version and make sure it is at least 0.27.0:

Upgrade OpenAI package


InvalidRequestError​

InvalidRequestError: This model's maximum context length is 4096 tokens​

This indicates that the input message_object sent to the ChatGPT API has exceeded the maximum allowed length of 4096 tokens.

You will need to shorten the length of your messages to resolve the issue:

Upgrade OpenAI package


Next steps​

1. Repo with source code Here is the repo with a Jupyter notebook with all the source code if you'd like to implement this on your own ⬇️ https://github.com/norahsakal/chatgpt-product-recommendation-embeddings

2. Do you need help with getting started with the ChatGPT API? Or do you have other questions? I'm happy to help, don't hesitate to reach out ➑️ norah@quoter.se

Or shoot me a DM on Twitter @norahsakal