Chatbots are becoming increasingly popular for providing quick and efficient customer support, answering questions, and helping users navigate through complex tasks.
In this blog post, I'll walk you through the process of building an AI-powered chatbot using ReactJS, OpenAI API and AWS Lambda.
The chatbot is designed to engage in conversation based on the user's input.
Do you want the full source code? This tutorial is quite extensive, and I've prepared the entire source code.
Visit this page to download the entire source code, you'll get instant access to the files, which you can use as a reference or as a starting point for your own voice-powered ChatGPT bot project.
Here's the simple chatbot interface we'll be building together:
Here are some of its key features:
1. Text-based interaction Users can type their questions
2. Voice input and output Users can send voice messages and the chatbot can transcribe them and reply with both text and audio responses
3. Context-aware conversations The chatbot leverages the OpenAI ChatGPT API to maintain context during the conversation. Which enables coherent interactions.
We'll be using the following technologies:
1. ReactJS A popular Javascript library for building user interfaces.
2. OpenAI API Powered by GPT-3.5-turbo to generate human-like responses
3. AWS Lambda A serverless compute service, where we can run our backend code without provisioning or managing servers. We'll use Lambda to handle audio transcription, text-to-speech, and calling the OpenAI API.
4. Material UI A popular React UI framework with components and styling.
5. ElevenLabs API A powerful API developed by ElevenLabs that offers state-of-the-art text-to-speech, voice cloning, and synthetic voice designing capabilities.
In the upcoming sections, I'll guide you through the entire process of building the chatbot, from setting up the frontend and backend to deploying the chatbot.
Let's get started!
Do you want the full source code? This tutorial is quite extensive, and I've prepared the entire source code.
Visit this page to download the entire source code, you'll get instant access to the files, which you can use as a reference or as a starting point for your own voice-powered ChatGPT bot project.
1. Create a new ReactJS app
To begin, start by creating a parent folder for your new chatbot project, we'll create this folder structure in the next step:
Navigate to the location you'd like to have your project and then run the following command in your terminal or command prompt:
mkdir your-project-name
Replace your-project-name
with the desired name for your chatbot project. Then navigate to that new folder by running the following command:
cd your-project-name
Then, let's create a new ReactJS app using create-react-app
.
This command-line tool helps us quickly set up a React project with the necessary build configurations and folder structure.
Run the following command in your terminal to create the app:
npx create-react-app frontend
After the project is created, navigate into the folder and start the development server:
cd frontend
npm start
This command will launch a new browser window, showing the default React app starter template:
Now that our React app is up and running, let's install the required libraries.
2. Install libraries
We'll need several libraries for our chatbot project.
First, we'll use Material UI (MUI) v5
for styling and UI components. MUI is a fully-loaded component library and design system with production-ready components.
To install MUI
, run the following command in your project folder frontend that got created earlier:
npm install @mui/material @emotion/react @emotion/styled
Additionally, we'll install MUI's icon package, which provides a set of SVG icons exported as React components:
npm install @mui/icons-material
Next, we'll need a library to handle microphone recording and output the audio as an mp3
file.
For this guide, we'll use the mic-recorder-to-mp3
library, but you can pick any library that will record your microphone and output an audio file in mp3
:
npm install mic-recorder-to-mp3
The mic-recorder-to-mp3
library also enables playback of recorded audio, which is a useful feature for our chatbot.
Finally, let's install aws-amplify
. This library will help us send the recorded audio to our backend using AWS Amplify:
npm install aws-amplify
With all the necessary libraries installed, we're ready to start building the audio recording functionality for our chatbot app.
3. Create the chat interface components
In this section, we'll build the components needed for a simple chatbot interface that allows users to record audio, stop recording, playback the recorded audio, and upload the audio to the backend:
We'll create the following components for the chat interface:
1. ChatHeader - to display the chatbot title and header information 2. ChatMessages - to display chat messages exchanged between the user and the chatbot 3. AudiControls - to provide the user audio control, including recording and unloading audio. 4. MessageInput - to provide the user text input option 5. ResponseFormatToggle - to provide the user the option to receive audio responses in addition to text responses
Let's start by changing the title of the app. Open up public/index.html
and change the title
tag to your desired name:
<title>ChatGPT audio chatbot</title>
Create React app
comes with reloading and ES6 support, so you should already see the changes in the browser tab:
Let's now set up our App.js
file.
Open App.js
from your src
folder and remove all the code within the return
statement.
Also, delete the default logo
and import React
and the hook useState
. Your App.js
file should now look like this:
import React from "react";
import './App.css';
function App() {
return (
);
}
export default App;
Now, let's import the necessary MUI
components, such as Container
and Grid
.
Wrap your app with a Container
component and add a maxWidth
of sm
to keep the window narrow for the chat interface. Additionally, add some padding to the top.
Your App.js
should now look like this:
import React from "react";
import './App.css';
// Material UI
import { Container, Grid } from '@mui/material';
function App() {
return (
<Container maxWidth="sm" sx={{ pt: 2 }}>
</Container>
);
}
export default App;
3.1. Create the ChatHeader component
The ChatHeader
component will display the chatbot title and any relevant header information. This component will be positioned at the top of the chat interface.
Start by creating a ChatHeader
component inside your App
function, we'll use Typography
, so import that component from MUI
:
import { Typography } from '@mui/material';
Then, define the ChatHeader
component with a headline for the chatbot:
const ChatHeader = () => {
return(
<Typography variant="h4" align="center" gutterBottom>
Voice Chatbot
</Typography>
)
}
The Typography
component from MUI
is used to display text in a consistent and responsive manner. The variant
prop sets the font size and style, while align
adjusts the text alignment, and gutterBottom
adds a bottom margin to create a space below the headline.
Next, include the ChatHeader
component in the return
statement of the App
function:
return(
<Container maxWidth="sm" sx={{ pt: 2 }}>
<ChatHeader />
</Container>
)
By adding the ChatHeader
component to the Container
, it is now integrated into the overall layout of the application.
Your app should now look like this:
3.2. Create the ChatMessages component
The ChatMessages
component will display chat messages exchanged between the user and the chatbot. It should update dynamically as new messages are added:
First, create an initial greeting message from the chatbot inside your App
function:
const mockMessages = [
{
role: 'assistant',
content: 'Hello, how can I help you today?',
text: 'Hello, how can I help you today?'
},
];
Then, import the useState
hook and save the mockMessages
to the state:
import React, { useState } from "react";
const [messages, setMessages] = useState(mockMessages);
Each object in the messages
array will have 3 keys:
role
determines if it is the chatbot or the user talkingtext
key is the text shown in the appcontent
is the text we'll use to send to the backend to create a completion.
The text
key will store both text and React components, which we'll get back to later.
Import necessary components from MUI
, such as List
, ListItem
, ListItemText
, Box
, and Paper
:
import {
Box, // Add to imports
Container,
Grid,
IconButton, // Add to imports
List, // Add to imports
ListItem, // Add to imports
ListItemText, // Add to imports
Paper, // Add to imports
Typography
} from "@mui/material";
To style our components, import useTheme
and styled
from MUI
:
import { useTheme } from '@mui/material/styles';
import { styled } from '@mui/system';
Before creating the chat area, define three styles for the chat messages: one for user messages
, one for agent messages
and one for the MessageWrapper
that wraps both messages inside your App
function.
The user messages
style should use the audio
prop to adjust the padding for the audio icon:
const UserMessage = styled('div', { shouldForwardProp: (prop) => prop !== 'audio' })`
position: relative;
background-color: ${({ theme }) => theme.palette.primary.main};
color: ${({ theme }) => theme.palette.primary.contrastText};
padding: ${({ theme }) => theme.spacing(1, 2)};
padding-right: ${({ theme, audio }) => (audio ? theme.spacing(6) : theme.spacing(2))};
border-radius: 1rem;
border-top-right-radius: 0;
align-self: flex-end;
max-width: 80%;
word-wrap: break-word;
`;
Then create the styling for the Agent messages:
const AgentMessage = styled('div')`
position: relative;
background-color: ${({ theme }) => theme.palette.grey[300]};
color: ${({ theme }) => theme.palette.text.primary};
padding: ${({ theme }) => theme.spacing(1, 2)};
border-radius: 1rem;
border-top-left-radius: 0;
align-self: flex-end;
max-width: 80%;
word-wrap: break-word;
`;
Finally, let's create the styling for the MessageWrapper
that wraps both the agent messages and the user messages:
const MessageWrapper = styled('div')`
display: flex;
margin-bottom: ${({ theme }) => theme.spacing(1)};
justify-content: ${({ align }) => (align === 'user' ? 'flex-end' : 'flex-start')};
`;
Each message in the ChatMessages
will have a play
icon if any audio is available, so we'll import a fitting icon component:
import VolumeUpIcon from '@mui/icons-material/VolumeUp';
Now, create a ChatMessages
component that displays the messages from the messages
array:
const ChatMessages = ({messages}) => {
}
Then add a useTheme
hook to access the MUI
theme:
const ChatMessages = ({messages}) => {
const theme = useTheme();
}
To improve user experience, we want the chat window to automatically scroll to the bottom whenever a new message is added to the conversation.
Start by importing useEffect
and useRef
hooks from React:
import React, {
useEffect, // Add to imports
useRef, // Add to imports
useState
} from "react";
useEffect
allows us to run side effects, such as scrolling the chat window, in response to changes in the component's state or properties. useRef
is used to create a reference to a DOM element so that we can interact with it programmatically.
Continue with defining a local variable bottomRef
in the ChatMessages
component, to create a reference to the bottom of the chat window:
const bottomRef = useRef(null);
Then create the scrollToBottom
function, which will be responsible for scrolling the chat window to the bottom:
const scrollToBottom = () => {
if (bottomRef.current) {
if (typeof bottomRef.current.scrollIntoViewIfNeeded === 'function') {
bottomRef.current.scrollIntoViewIfNeeded({ behavior: 'smooth' });
} else {
bottomRef.current.scrollIntoView({ behavior: 'smooth' });
}
}
};
This function first checks if bottomRef.current
is defined. If it is, it then checks if the scrollIntoViewIfNeeded
function is available.
If available, it smoothly scrolls the chat window to the bottom using scrollIntoViewIfNeeded
. scrollIntoViewIfNeeded
is only supported by some browsers, for example not by Safari. So if it's not available, it uses the scrollIntoView
function instead, which is more widely supported, to achieve the same effect.
Next, add a useEffect
hook that triggers the scrollToBottom
function whenever the messages
prop changes:
useEffect(() => {
scrollToBottom();
}, [messages]);
This will ensure that the chat window always scrolls to the bottom when new messages are added to the conversation.
Then finally create the components where the chat messages will be displayed in the return
statement of ChatMessages
:
return(
<Container>
<Box sx={{ width: '100%', mt: 4, maxHeight: 300, minHeight: 300, overflow: 'auto' }}>
<Paper elevation={0} sx={{ padding: 2 }}>
<List>
{messages.map((message, index) => (
<ListItem key={index} sx={{ padding: 0 }}>
<ListItemText
sx={{ margin: 0 }}
primary={
<MessageWrapper align={message.role}>
{message.role === 'user' ? (
<>
<UserMessage theme={theme} audio={message.audio}>
{message.text}
{message.audio && (
<IconButton
size="small"
sx={{
position: 'absolute',
top: '50%',
right: 8,
transform: 'translateY(-50%)'
}}
onClick={() => message.audio.play()}
>
<VolumeUpIcon fontSize="small" />
</IconButton>
)}
</UserMessage>
</>
) : (
<AgentMessage theme={theme}>
{message.text}
</AgentMessage>
)}
</MessageWrapper>
}
/>
</ListItem>
))}
</List>
</Paper>
</Box>
</Container>
)
Lastly, add the bottomRef
to your List
component to make the auto-scrolling functionality work:
// ........
</ListItem>
))}
<div ref={bottomRef} /> // Add this ref
</List>
</Paper>
// ........
By adding the bottomRef
to an empty <div>
at the end of the List
component, we can now programmatically scroll the chat window to the bottom whenever new messages are added to the conversation.
Let's break down what we're doing in the ChatMessages
component in detail.
We start by defining the ChatMessages
component, which takes the messages
prop. We also use the useTheme
hook to access the Material-UI theme:
const ChatMessages = ({messages}) => {
const theme = useTheme()
We then wrap the chat area with a Container
component. Inside the container, we use a Box
component with specific styles for width, margin, maximum height, and overflow. This ensures that the chat area has a fixed height and scrolls if there are more messages than can fit in the available space.
We then use a Paper
component with an elevation of 0 to remove a raised effect, making the chat area stand out from the background. We also add some padding to the Paper
component.
Inside the Paper
component, we use a List
component to hold the chat messages:
{messages.map((message, index) => (
<ListItem key={index} sx={{ padding: 0 }}>
<ListItemText
sx={{ margin: 0 }}
primary={
<MessageWrapper align={message.role}>
We iterate over the messages
array and create a ListItem
component for each message. We then set the padding of the ListItem
to 0 and provide a unique key using the index. We then use ListItemText
component to display the message content.
We conditionally align the message based on the role
using the MessageWrapper
component. The MessageWrapper
component uses the align
prop to justify the content to either
- flex-end for user messages or
- flex-start for agent messages
{message.role === 'user' ? (
<>
<UserMessage theme={theme} audio={message.audio}>
{message.text}
{message.audio && (
<IconButton
size="small"
sx={{
position: 'absolute',
top: '50%',
right: 8,
transform: 'translateY(-50%)',
}}
onClick={() => message.audio.play()}
>
<VolumeUpIcon fontSize="small" />
</IconButton>
)}
</UserMessage>
</>
) : (
<AgentMessage theme={theme}>
{message.text}
</AgentMessage>
)}
We conditionally apply the UserMessage
or AgentMessage
styling based on the role
.
We pass the Material-UI theme
and the audio
prop, if available, to the UserMessage
component. If the message has associated audio, we display an IconButton
component with the VolumeUpIcon
. The IconButton
has an onClick
event that plays the audio when clicked.
The same structure is applied to the AgentMessage
component. The styling for the AgentMessage
is slightly different, but the functionality remains the same.
In summary, the ChatMessages
component is responsible for displaying chat messages in a styled, scrollable area. It takes an array of messages and iterates over them, creating a list of messages aligned based on the role
, user or agent.
It also displays an audio icon for messages with associated audio, allowing users to play the audio by clicking the icon.
Now we're ready to include the ChatMessages
component in our return statement of the App
function, your return
statement should look like this now:
return (
<Container maxWidth="sm" sx={{ pt: 2 }}>
<ChatHeader />
<ChatMessages messages={messages} />
</Container>
)
Your app should now look like this with a greeting message:
Let's go ahead and create the audio controls in the next segment.
3.3 Create the AudioControls
The next step is to create the audio controls:
Start by importing the MicRecorder
library:
import MicRecorder from 'mic-recorder-to-mp3';
Then, go ahead and define the function outside the App
function and create four new state variables:
const AudioControls = () => {
const [isRecording, setIsRecording] = useState(false);
const [recorder, setRecorder] = useState(null);
const [player, setPlayer] = useState(null);
const [audioFile, setAudioFile] = useState(null);
}
AudioControls
is placed outside of theApp
function to encapsulate its state and logic, making the component reusable and easier to maintain. The separation concerns also helps prevent unnecessary re-renders of theApp
component when state changes occur within theAudioControls
component.By defining the
AudioControls
component outside of theApp
function, you can more efficiently manage the state related to recording, playing, and uploading audio, making your application more modular and organized.
We'll have four buttons in the AudioControls
component:
1. Start a recording 2. Stop the recording 3. Play the recording 4. Upload audio
For the icon buttons, we'll need a microphone icon and a dot, import those icon components:
import FiberManualRecordIcon from '@mui/icons-material/FiberManualRecord';
import MicIcon from '@mui/icons-material/Mic';
Also, import the Button
component from MUI
:
import {
Button, // Add to imports
Container,
Grid,
IconButton,
List,
ListItem,
ListItemText,
Paper,
Typography
} from "@mui/material";
Let's create the function for starting an audio recording inside the AudioControls
function:
const startRecording = async () => {
const newRecorder = new MicRecorder({ bitRate: 128 });
try {
await newRecorder.start();
setIsRecording(true);
setRecorder(newRecorder);
} catch (e) {
console.error(e);
alert(e)
}
};
Let's break down what we're doing in the function. We're declaring an asynchronous function using async
:
const startRecording = async () => {
This allows us to use the keyword await
within the function to handle the Promise from MicRecorder
.
The next step is to create a new instance of MicRecorder
with a bitrate
of 128 kps. The bitrate
option is specifying the quality of the recorded audio. A higher bitrate means better quality but a larger file size:
const newRecorder = new MicRecorder({ bitRate: 128 });
Then we're calling the start()
method on the newRecorder
instance to start recording in a try
block:
try {
await newRecorder.start();
The await
keyword is used with newRecorder.start()
to pause the function's execution until the Promise resolves or rejects.
If the audio recording start successfully, the Promise resolves and proceeds to update the React component's states:
setIsRecording(true);
setRecorder(newRecorder);
-
The
setIsRecording(true)
call sets theisRecording
state to true, indicating that the recording is in progress. -
The
setRecorder(newRecorder)
call sets therecorder
state to thenewRecorder
instance, so it can be used later to stop the recording.
If the start()
method fails, which could be due to permission issues or the microphone being unavailable, then the catch
block gets executed:
catch (e) {
console.error(e);
alert(e)
}
This block logs the error and shows an alert
so you can troubleshoot the issue.
Let's also crate the function for stopping the audio recording:
const stopRecording = async () => {
if (!recorder) return;
try {
const [buffer, blob] = await recorder.stop().getMp3();
const audioFile = new File(buffer, "voice-message.mp3", {
type: blob.type,
lastModified: Date.now(),
});
setPlayer(new Audio(URL.createObjectURL(audioFile)));
setIsRecording(false);
setAudioFile(audioFile); // Add this line
} catch (e) {
console.error(e);
alert(e)
}
};
Here's the breakdown of what we did; starting with declaring the function as an asynchronous function with the async
keyword to handle Promises:
const stopRecording = async () => {
Then we added the try
block to attempt to stop the recording and get the MP3 data:
try {
const [buffer, blob] = await recorder.stop().getMp3();
The await
keyword is used with recorder.stop().getMp3()
to pause the function's execution until the Promise is resolved or rejected.
If the Promise is resolved, the buffer
and blob
variables are assigned values returned by the getMp3()
method.
Then we converted the recorded audio into an MP3 file:
const audioFile = new File(buffer, 'voice-message.mp3', {
type: blob.type,
lastModified: Date.now(),
});
In this code, the File
constructor is used to create a new File
object with the audio data, the name voice-message.mp3
, the appropriate file type and the last-modified timestamp.
The MP3 file is then used to create a new Audio
object, which can be played back:
setPlayer(new Audio(URL.createObjectURL(file)));
The URL.createObjectURL(file)
method creates a URL representing the file, and the new Audio()
constructor creates a new Audio
object using that URL.
The setPlayer(newPlayer)
call updates the React component's player
state with the new Audio
object.
In the next step, we update the React component's state:
setIsRecording(false);
The setIsRecording(false)
call sets the isRecording
state to false
, indicating that the recording is no longer in process.
If the stop().getMp3()
method fails, which could be due to an issue with the recorder, the catch
block is executed:
catch (e) {
console.error(e);
alert(e);
}
Let's also create the function for playing a recording:
const playRecording = () => {
if (player) {
player.play();
}
};
Now that we have the audio control functions ready, we can create the AudioControl
component:
return (
<Container>
<Box sx={{ width: "100%", mt: 4 }}>
<Grid container spacing={2} justifyContent="flex-end">
<Grid item xs={12} md>
<IconButton
color="primary"
aria-label="start recording"
onClick={startRecording}
disabled={isRecording}
>
<MicIcon />
</IconButton>
</Grid>
<Grid item xs={12} md>
<IconButton
color="secondary"
aria-label="stop recording"
onClick={stopRecording}
disabled={!isRecording}
>
<FiberManualRecordIcon />
</IconButton>
</Grid>
<Grid item xs="auto">
<Button
variant="contained"
disableElevation
onClick={playRecording}
disabled={isRecording}
>
Play Recording
</Button>
</Grid>
</Grid>
</Box>
</Container>
)
We're ready to include the AudioControls
component in the return
statement of the App
function, and your return
statement should now look like this:
return(
<Container maxWidth="sm" sx={{ pt: 2 }}>
<ChatHeader />
<ChatMessages messages={messages} />
<AudioControls />
</Container>
)
If you look at your app, you'll see that one button is missing: upload audio:
We'll create it in the coming sections, but first, let's first create the logic for switching between audio and text.
3.4 Create the audio response toggle
In this section, we'll build the ResponseFormatToggle
, which allows users to decide if they want an audio response in addition to the text response:
Just like we did with
AudioControls
, we'll place theResponseFormatToggle
outside of theApp
function to encapsulate its state and logic, making the component reusable and easier to maintain.
First, add the isAudioResponse
and setIsAudioResponse
variables to your main state:
const [isAudioResponse, setIsAudioResponse] = useState(false);
Next, create the ResponseFormatToggle
component outside of the App
function and pass the variables as props:
const ResponseFormatToggle = ({ isAudioResponse, setIsAudioResponse }) => {
}
Define the function for handling the toggle change in the ResponseFormatToggle
function:
const handleToggleChange = (event) => {
setIsAudioResponse(event.target.checked);
};
We'll need to import two new MUI
components; FormControlLabel
and Switch
:
import {
Button,
Container,
FormControlLabel, // Add to imports
Grid,
IconButton,
List,
ListItem,
ListItemText,
Paper,
Switch, // Add to imports
Typography
} from "@mui/material";
Now, create the component for the toggle, and your ResponseFormatToggle
should now look like this:
const ResponseFormatToggle = ({ isAudioResponse, setIsAudioResponse }) => {
const handleToggleChange = (event) => {
setIsAudioResponse(event.target.checked);
};
return (
<Box sx={{ display: "flex", justifyContent: "center", marginTop: 2 }}>
<FormControlLabel
control={
<Switch
checked={isAudioResponse}
onChange={handleToggleChange}
color="primary"
/>
}
label="Audio response"
/>
</Box>
);
};
Finally, add the ResponseFormatToggle
to the return
statement of the App
function:
return (
<Container maxWidth="sm" sx={{ pt: 2 }}>
<ChatHeader />
<ChatMessages messages={messages} />
<AudioControls />
<ResponseFormatToggle isAudioResponse={isAudioResponse} setIsAudioResponse={setIsAudioResponse} />
</Container>
);
Your app should now display a functioning toggle button:
With the toggle button in place, we're ready to create the missing UploadButton
:
3.5 Create the upload button
The SendButton
is part of the AudioControls
component and is responsible for uploading the audio file to the backend.
To keep the user informed while the audio is being sent and processed in the backend, we'll create a new component, ThinkingBubble
, that pulses while the chatbot is "thinking".
Both
ThinkingBubble
andSendButton
are placed outside of theApp
function to encapsulate its state and logic, making the component reusable and easier to maintain.
To create the pulse
motion, we'll need to import keyframes
from MUI
:
import {
keyframes, // Add this import
styled
}
from '@mui/system';
Then define the pulse
motion outside of your App
function:
const pulse = keyframes`
0% {
transform: scale(1);
opacity: 1;
}
50% {
transform: scale(1.1);
opacity: 0.7;
}
100% {
transform: scale(1);
opacity: 1;
}
`;
We'll use the MoreHorizIcon
for the thinking bubble, so import it from MUI
:
import MoreHorizIcon from '@mui/icons-material/MoreHoriz';
Now, create the ThinkBubbleStyled
component with the pulse
animation below the pulse
definition:
const ThinkingBubbleStyled = styled(MoreHorizIcon)`
animation: ${pulse} 1.2s ease-in-out infinite;
margin-bottom: -5px;
`;
Finally, create the ThinkingBubble
component:
const ThinkingBubble = () => {
const theme = useTheme();
return <ThinkingBubbleStyled theme={theme} sx={{ marginBottom: '-5px' }} />;
};
This ThinkingBubble
will be styled
with MUI
so it needs to define the theme
.
Now we're ready to create the SendButton
component, begin by defining it with a useTheme
hook:
const SendButton = ({audioFile}) => {
const theme = useTheme();
}
Continue by creating a function in the SendButton
for uploading the audio file to the backend, which starts a check for if an audio file exists:
const uploadAudio = async () => {
if (!audioFile) {
console.log("No audio file to upload");
return;
}
}
Before we add the backend API call function, let's create a helper function that will create the message objects
needed as ChatGPT prompt. Add this function in the main application since we'll use it for components both within and outside of the App
function:
function filterMessageObjects(list) {
return list.map(({ role, content }) => ({ role, content }));
}
Make sure to add filterMessageObjects
as props and SendButton
should now have two props:
const SendButton = ({audioFile, filterMessageObjects}) => {
// .....
}
This function maps the messages
and creates a new array with only the role
and content
keys. For the backend call itself, we'll use Amplify
which we installed earlier, go ahead and import the library:
import { API } from "aws-amplify";
The next step is adding the async
backend call and your uploadAudio
function should now look like this:
const uploadAudio = async () => {
if (!audioFile) {
console.log("No audio file to upload");
return;
}
try {
const reader = new FileReader();
reader.onloadend = async () => {
const base64Audio = reader.result;
// Add a unique id to the message to be able to update it later
const messageId = new Date().getTime();
// Create the message objects
let messageObjects = filterMessageObjects(messages)
// Add user's audio message to the messages array
setMessages((prevMessages) => [
...prevMessages,
{ role: "user", content: "🎤 Audio Message", audio: new Audio(base64Audio), text: "🎤 Audio Message", id: messageId },
]);
// Add thinking bubble
setMessages((prevMessages) => [
...prevMessages,
{ role: "assistant", content: <ThinkingBubble theme={theme} sx={{ marginBottom: '-5px' }} />, text: <ThinkingBubble theme={theme} sx={{ marginBottom: '-5px' }} />, key: "thinking" },
]);
const response = await API.post("api", "/get-answer", {
headers: {
"Content-Type": "application/json",
},
body: {
audio: base64Audio,
messages: messageObjects,
isAudioResponse
},
});
// Remove the thinking bubble
setMessages((prevMessages) => {
return prevMessages.filter((message) => message.key !== "thinking");
});
};
reader.readAsDataURL(audioFile);
} catch (error) {
console.error("Error uploading audio file:", error);
alert(error)
}
};
Let's break down how the uploadAudio
function is built and examine each step in detail:
1. Check if an audio file exists
The function starts by checking if an audioFile
exists. If not, it logs a message and returns early to prevent further execution.
if (!audioFile) {
console.log("No audio file to upload");
return;
}
2. Create a FileReader instance
A new FileReader
instance is created to read the audio file's content. The reader.onloadend
event is used to handle the file reading completion. It's an async event to ensure that the reading process is complete before proceeding:
const reader = new FileReader();
reader.onloadend = async () => {
// ... remaining steps
};
3. Convert the audio file to Base64
The reader.result
contains the audio file's content in Base64 format. This is needed for further processing and transmitting the file to the backend:
const base64Audio = reader.result;
4. Generate a unique message ID
To uniquely identify messages, generate a unique ID based on the current timestamp. We're doing this to keep track of a placeholder message (the pulsing ThinkingBubble
) while the backend is processing the audio file:
const messageId = new Date().getTime();
5. Create message objects
Use the filterMessageObjects
helper function to create an array containing only the role
and content
keys for each message:
let messageObjects = filterMessageObjects(messages);
6. Add the user's audio message
Update the messages
array with the new audio message, including its role, content, audio, text, and the unique ID:
setMessages((prevMessages) => [
...prevMessages,
{
role: "user",
content: "🎤 Audio Message",
audio: new Audio(base64Audio),
text: "🎤 Audio Message",
id: messageId,
},
]);
The unique ID is used later to update the
content
key with the transcribed audio message from the backend
7. Add the thinking bubble
Display the ThinkingBubble
component to indicate that the chatbot is processing the user's input:
setMessages((prevMessages) => [
...prevMessages,
{
role: "assistant",
content: <ThinkingBubble theme={theme} sx={{ marginBottom: '-5px' }} />,
text: <ThinkingBubble theme={theme} sx={{ marginBottom: '-5px' }} />,
key: "thinking",
},
]);
We'll add the key thinking to keep track of the object for when we're removing it from the array later.
8. Make the backend call
Use the API.post
method from Amplify to send the Base64 audio file, message objects, and the isAudioResponse
flag to the backend for processing:
const response = await API.post("api", "/get-answer", {
headers: {
"Content-Type": "application/json",
},
body: {
audio: base64Audio,
messages: messageObjects,
isAudioResponse,
},
});
9. Remove the thinking bubble
Once the response is received, remove the ThinkingBubble
component from the message array:
setMessages((prevMessages) => {
return prevMessages.filter((message) => message.key !== "thinking");
});
10. Read the audio file
Lastly, initiate the process of reading the audio file using the reader.readAsDataURL(audioFile)
method:
reader.readAsDataURL(audioFile);
Let's update the SendButton
component to include the necessary isAudioResponse
, messages
and setMessages
as props:
const SendButton = ({
audioFile,
isAudioResponse,
filterMessageObjects,
messages,
setMessages}) => {
// .....
}
Let's also create the Button
component, we'll need the CloudUploadIcon
, so start by importing it and then add the Button
component to the return
statement of the SendButton
:
import CloudUploadIcon from "@mui/icons-material/CloudUpload";
return (
<Grid item xs="auto">
<Button
variant="contained"
color="primary"
disableElevation
onClick={uploadAudio}
disabled={!audioFile}
startIcon={<CloudUploadIcon />}
>
Upload Audio
</Button>
</Grid>
);
Now that the SendButton
component is complete, incorporate it into the AudioControls
component created earlier:
const AudioControls = () => {
// startRecording ...
// stopRecording ...
// playRecording ...
// ...
return (
// ...
<Grid item xs="auto">
<Button
variant="contained"
disableElevation
onClick={playRecording}
disabled={isRecording}
>
Play Recording
</Button>
</Grid>
<SendButton audioFile={audioFile} isAudioResponse={isAudioResponse} filterMessageObjects={filterMessageObjects}
messages={messages}
setMessages={setMessages} /> {/* Add the SendButton component */}
// ....
Since SendButton
requires the props isAudioResponse
,filterMessageObjects
, messages
and setMessages
, make sure to include them in both the return
statement for AudioControls
:
return (
<Container maxWidth="sm" sx={{ pt: 2 }}>
<ChatHeader />
<ChatMessages messages={messages} />
<AudioControls isAudioResponse={isAudioResponse} filterMessageObjects={filterMessageObjects} messages={messages} setMessages={setMessages} />
<ResponseFormatToggle isAudioResponse={isAudioResponse} setIsAudioResponse={setIsAudioResponse} />
</Container>
);
Also add isAudioResponse
,filterMessageObjects
, messages
and setMessages
as props for AudioControls
:
const AudioControls = ({isAudioResponse, filterMessageObjects, messages, setMessages}) => {
// ....
}
With these updates, your SendButton
component receives the necessary props and is now integrated into the AudioControls
component.
Your app should now have an Upload Audio
button:
Now you have a functional SendButton
component that uploads the audio file to the backend and displays a ThinkingBubble
component while the chatbot processes the user's input. Once the response is received, the ThinkingBubble
is removed, and the assistant's response is displayed.
3.6 Create the message input
For this guide, we're giving the users the option to send both audio and text messages. Let's create the last component, the MessageInput
, which will allow users to type and send text messages.
Start by defining a message
variable in the main App
function:
// Main app
const [message, setMessage] = useState("");
Then continue with defining the component outside of the App
function:
const MessageInput = () => {
}
This component will need to send the isAudioResponse
flag to the backend, so add it as props:
const MessageInput = ({isAudioResponse}) => {
}
Also, add the variables message
and setMessage
as props:
const MessageInput = ({message, setMessage, isAudioResponse}) => {
}
Next, create a function to handle the text input change, and place this function inside the MessageInput
function:
const handleInputChange = (event) => {
setMessage(event.target.value);
};
Now, add a function that sends the text message to the backend, and place it inside the App
function:
const handleSendMessage = async () => {
if (message.trim() !== "") {
// Send the message to the chat
// Add the new message to the chat area
setMessages((prevMessages) => [
...prevMessages,
{ role: "user", content: message, text: message, audio: null },
]);
// Clear the input field
setMessage("");
// Add thinking bubble
setMessages((prevMessages) => [
...prevMessages,
{ role: "assistant", content: <ThinkingBubble theme={theme} sx={{ marginBottom: '-5px' }} />, text: <ThinkingBubble theme={theme} sx={{ marginBottom: '-5px' }} />, key: "thinking" },
]);
// Create backend chat input
let messageObjects = filterMessageObjects(messages)
messageObjects.push({ role: "user", content: message })
// Create endpoint for just getting the completion
try {
// Send the text message to the backend
const response = await API.post("api", "/get-answer", {
headers: {
"Content-Type": "application/json",
},
body: {
text: message,
messages: messageObjects,
isAudioResponse
},
});
// Remove the thinking bubble
setMessages((prevMessages) => {
return prevMessages.filter((message) => message.key !== "thinking");
});
} catch (error) {
console.error("Error sending text message:", error);
alert(error);
}
}
};
The handleSendMessage
function uses the theme
so let's add a useTheme
hook to access the MUI theme in the main App
function:
const theme = useTheme();
Let's break down what we're doing in handleSendMessage
and examine each step in detail:
1. Check if the message is not empty
The function starts by checking if the message
is not an empty string (ignoring leading and trailing whitespaces). If it's empty, the function will not process further:
if (message.trim() !== "") {
// ... remaining steps
}
2. Add the user's text message
Update the messages
array with the new text message, including its role, content, text and audio:
setMessages((prevMessages) => [
...prevMessages,
{ role: "user", content: message, text: message, audio: null },
]);
3. Clear the input field Clear the input field to allow the user to enter a new message after the response:
setMessage("");
4. Add the thinking bubble
Display the ThinkingBubble
component to indicate that the chatbot is processing the user's input.
setMessages((prevMessages) => [
...prevMessages,
{
role: "assistant",
content: <ThinkingBubble theme={theme} sx={{ marginBottom: '-5px' }} />,
text: <ThinkingBubble theme={theme} sx={{ marginBottom: '-5px' }} />,
key: "thinking",
},
]);
5. Create message objects
Use the filterMessageObjects
helper function to create an array containing only the role
and content
keys for each message. Then, push the new text message into the array:
let messageObjects = filterMessageObjects(messages);
messageObjects.push({ role: "user", content: message });
6. Make the backend API call
Use the API.post
method from Amplify to send the text message, message object, and the isAudioResponse
flag to the backend for processing:
const response = await API.post("api", "/get-answer", {
headers: {
"Content-Type": "application/json",
},
body: {
text: message,
messages: messageObjects,
isAudioResponse,
},
});
7. Remove the thinking bubble
Once the response is received, remove the ThinkingBubble
component from the messages array:
setMessages((prevMessages) => {
return prevMessages.filter((message) => message.key !== "thinking");
});
8. Catch any errors
If there are any errors while sending the text message to the backend, log the error message and show an alert
:
catch (error) {
console.error("Error sending text message:", error);
alert(error);
}
The handleSendMessage
function is now handling sending the text message, updating the UI with a thinking bubble, and making a backend API call to process the user's input.
To add functionality for listening to a key event within the MessageInput
component, define the handleKeyPress
function:
const handleKeyPress = (event) => {
if (event.key === "Enter") {
handleSendMessage();
}
};
The handleKeyPress
function checks if the Enter
key is pressed. If so, it calls the handleSendMessage
function, triggering the message-sending process.
Add the handleSendMessage
as props in MessageInput
, and it should now look like this:
const MessageInput = ({ message, setMessage, isAudioResponse, handleSendMessage }) => {
// ....
}
We now just need to add a TextField
so the user can use it to type and send a text message. Start by importing the TextField
component from MUI
:
import {
Button,
Container,
FormControlLabel,
Grid,
IconButton,
List,
ListItem,
ListItemText,
Paper,
Switch,
TextField, // Add to imports
Typography
} from "@mui/material";
And then import the SendIcon
:
import SendIcon from "@mui/icons-material/Send";
Then add the TextField
and IconButton
within the return
statement of the MessageInput
component:
return (
<Box sx={{ display: "flex", alignItems: "center", marginTop: 2 }}>
<TextField
variant="outlined"
fullWidth
label="Type your message"
value={message}
onChange={handleInputChange}
onKeyPress={handleKeyPress}
/>
<IconButton
color="primary"
onClick={() => handleSendMessage(isAudioResponse)}
disabled={message.trim() === ""}
>
<SendIcon />
</IconButton>
</Box>
);
Lastly, add the MessageInput
component in the return
statement above the ResponseFormatToggle
in your App
function:
return (
<Container maxWidth="sm" sx={{ pt: 2 }}>
<ChatHeader />
<ChatMessages messages={messages} />
<AudioControls isAudioResponse={isAudioResponse} filterMessageObjects={filterMessageObjects} messages={messages} setMessages={setMessages} />
<MessageInput message={message} setMessage={setMessage} isAudioResponse={isAudioResponse} handleSendMessage={handleSendMessage} />
<ResponseFormatToggle isAudioResponse={isAudioResponse} setIsAudioResponse={setIsAudioResponse} />
</Container>
);
If you check your app, you should now see a text input field where you can type a text message:
3.7 Create the backend response handling
Before we can start to build the backend, there is one last function we'll need to build; handleBackendResponse
. This function is responsible for transforming the backend response into the format required by the ChatMessages
component and is placed inside the App
function.
Start by defining the function:
const handleBackendResponse = (response, id = null) => {
}
We have two arguments: the backend response and id
. The id
is used to track the user message when it is an audio file and has been transcribed.
Whenever a user sends an audio message, the placeholder chat message is
🎤 Audio Message
So when the audio has been transrcibed into text, we want to add it to themessages
to be able to keep track of what the user said to the chatbot. That's why we're keeping track of the chat messageid
The backend response will have three keys:
The generated text (the ChatGPT answer)
The generated audio (if isAudioResponse
is true
)
Transcription of the message
Create local variables of each response key:
const generatedText = response.generated_text;
const generatedAudio = response.generated_audio;
const transcription = response.transcription;
Next, let's create an audio element if it is present:
const audioElement = generatedAudio
? new Audio(`data:audio/mpeg;base64,${generatedAudio}`)
: null;
Now, create an AudioMessage
component. This chat message can be clicked on by the user if audio is present:
const AudioMessage = () => (
<span>
{generatedText}{" "}
{audioElement && (
<IconButton
aria-label="play-message"
onClick={() => {
audioElement.play();
}}
>
<VolumeUpIcon style={{ cursor: "pointer" }} fontSize="small" />
</IconButton>
)}
</span>
);
The final step is to add a conditional statement for updating the messages
array, put it below the AudioMessage
component:
if (id) {
setMessages((prevMessages) => {
const updatedMessages = prevMessages.map((message) => {
if (message.id && message.id === id) {
return {
...message,
content: transcription,
};
}
return message;
});
return [
...updatedMessages,
{
role: "assistant",
content: generatedText,
audio: audioElement,
text: <AudioMessage />,
},
];
});
} else {
// Simply add the response when no messageId is involved
setMessages((prevMessages) => [
...prevMessages,
{
role: "assistant",
content: generatedText,
audio: audioElement,
text: <AudioMessage />,
},
]);
}
Let's break down the conditional statement within the handleBackendResponse
function and examine each step in detail:
1. Check if id
is present
The conditional statement checks if the id
argument is provided. If id
is present, it means the message is an audio transcription, and we need to update the existing message with the transcribed text. If id
is not present, we directly add the chatbot's response to the messages
array:
if (id) {
// ... update existing message and add chatbot's response
} else {
// ... directly add the chatbot's response
}
2. Update the existing message with the transcription
If id
is present, we iterate through the messages
array using the map
function. For each message, if the message's id
matches the provided id
, we create a new message object with the same properties and update its content
with the transcription:
const updatedMessages = prevMessages.map((message) => {
if (message.id && message.id === id) {
return {
...message,
content: transcription,
};
}
return message;
});
3. Add the chatbot's response to the updated messages array
Next, we add the chatbot's response, including the generated text, audio element, and AudioMessage
component, to the updateMessages
array:
return [
...updatedMessages,
{
role: "assistant",
content: generatedText,
audio: audioElement,
text: <AudioMessage />,
},
];
4. Set the updated messages array
The setMessages
function is called with the updated messages array, which contains the transcribed message and the chatbot's response:
setMessages((prevMessages) => {
// ... logic for updating messages and adding chatbot's response
});
5. Directly add the chatbot's response when no id
is involved
If the id
is not present, we don't need to update any existing messages. Instead, we directly add the chatbot's response, including the generated text, audio element and AudioMessage
component, to the messages
array:
setMessages((prevMessages) => [
...prevMessages,
{
role: "assistant",
content: generatedText,
audio: audioElement,
text: <AudioMessage />,
},
]);
The entire process ensures that the messages
array is updated correctly, whether the user input is transcribed audio message or a simple text message.
Finally, you'll need to call the handleBackendResponse
function in two locations within your code:
1. After removing the thinking bubble in the SendButton component
Add handleBackendResponse
as a prop and call the function:
const SendButton = ({ audioFile, isAudioResponse, handleBackendResponse, filterMessageObjects, messages, setMessages }) => {
// ......
setMessages((prevMessages) => {
return prevMessages.filter((message) => message.key !== "thinking");
});
handleBackendResponse(response, messageId); // Add function call
// ......
2. After removing the thinking bubble in the handleSendMessage component
Add a call to the handleBackendResponse
function:
const handleSendMessage = async () => {
// ......
setMessages((prevMessages) => {
return prevMessages.filter((message) => message.key !== "thinking");
});
handleBackendResponse(response); // Add function call
// ......
}
After adding handleSendMessage
as a prop, update the AudioControls
:
const AudioControls = ({ isAudioResponse, handleBackendResponse, messages, filterMessageObjects, setMessages }) => {
// ....
<SendButton
audioFile={audioFile}
isAudioResponse={isAudioResponse}
handleBackendResponse={handleBackendResponse} // Add handleBackendResponse
filterMessageObjects={filterMessageObjects}
messages={messages}
setMessages={setMessages} />
}
Also, update the return
statement to this:
return (
<Container maxWidth="sm" sx={{ pt: 2 }}>
<ChatHeader />
<ChatMessages messages={messages} />
<AudioControls isAudioResponse={isAudioResponse} filterMessageObjects={filterMessageObjects} messages={messages} setMessages={setMessages} handleBackendResponse={handleBackendResponse} />
<MessageInput message={message} setMessage={setMessage} isAudioResponse={isAudioResponse} handleSendMessage={handleSendMessage} handleBackendResponse={handleBackendResponse} />
<ResponseFormatToggle isAudioResponse={isAudioResponse} setIsAudioResponse={setIsAudioResponse} />
</Container>
);
We're all set to start building our backend in Python.
4. Create an AWS account
In this guide, we'll use AWS Lambda for the Python backend, powered by AWS API Gateway to handle the REST calls. We'll create the Lambda with our Python code using the Serverless framework.
To begin, you'll need to create a new AWS account if you don't already have one.
1. Visit https://aws.amazon.com and click Sign In to the Console:
2. Click Create a new AWS account:
3. Complete the signup process:
Important
Before proceeding, create a billing alarm to ensure you receive a notification if your bill increases unexpectedly.
Follow these steps to set up a billing alarm: AWS docs
4. Next, create a user on your account. In the top menu, type IAM, then click on IAM from the dropdown:
5. Click on Users in the left menu:
6. Click on Add users:
7. Choose a username for the new user and click Next:
8. Set up permissions for the new user. Click on Attach policies directly:
9. Scroll down and type admin in the search field, then select AdministratorAccess:
10. Scroll to the bottom of the page and click Next:
11. Review the policies, then scroll down to the bottom of the page and click Create user:
12. Click on the user you just created:
13. Click on Security credentials:
14. In the Security credentials menu, scroll down to the Access keys section and click Create access key:
15. Choose Command Line Interface and scroll down to the bottom of the page, then click Next:
16. Optionally, add tags for the new user, then click Create access key:
17. You've now reached the final step of creating a new IAM user. Be sure to save the access key and the secret access key:
Either copy the keys and store them in a secure location or download the CSV file. This is crucial since you won't be able to reveal the secret access key again after this step.
Make sure to save both the secret access key somewhere safe, since you won't be able to reveal it again after this step.
We'll configure your AWS user in the next step so make sure to have both the access key and the secret access key available for the next step.
5. Set up AWS CLI and configure your account
In this section, we'll guide you through installing the AWS Command Line Interface (CLI) and configuring it with your AWS account.
5.1 Install AWS CLI
First, you'll need to install the AWS CLI on your computer.
Follow the installation instructions for your operating system: AWS docs
After the installation is complete, you can verify that the AWS CLI is installed by running the following command in your command prompt:
aws --version
You should see an output similar to this:
aws-cli/2.3.0 Python/3.8.8 Linux/4.14.193-113.317.amzn2.x86_64 botocore/2.0.0
5.2 Configure your AWS CLI
Now that the AWS CLI is installed, you'll need to configure it with your AWS account. Make sure you have your access key and the secret access key from the previous section.
Run the following command in your terminal or command prompt:
aws configure
You'll be prompted to enter your AWS credentials:
- AWS Access Key ID [None]:
Enter your access key
- AWS Secret Access Key [None]:
Enter your secret access key
Next, you'll need to specify a default region and output format. The region is where your AWS resources will be created. Choose the region that's closest to you or your target audience.
You can find a complete list of available regions and their codes in the AWS documentation: https://docs.aws.amazon.com/general/latest/gr/rande.html
For example:
- Default region name [None]:
Enter your desired region code, such as us-east-1
- Default output format [None]:
Enter the output format you prefer, such as json
Your AWS CLI is now configured, and you can start using it to interact with your AWS account.
In the next section, we'll create an AWS Lambda function and configure it with the necessary resources to power your chatbot's backend.
6. Set up a Serverless project with a handler.py file
This section will guide you through creating a new Serverless project with a handler.py
file using the Serverless Framework. The handler.py
file will contain the code for your AWS Lambda function, which will power your chatbot's backend.
6.1 Install the Serverless Framework
First, you need to install the Serverless Framework on your computer. Make sure you have Node.js
installed and then run the following command in your terminal or command prompt:
npm install -g serverless
After the installation is complete, verify that the Serverless Framework is installed by running the following command:
serverless --version
You should see output similar to this:
2.71.1
6.2 Create a new Serverless project
Now that the Serverless Framework is installed, you can create a new Serverless project. first, navigate to the folder we created for the project my-chatbot-project
:
cd my-chatbot-project
Then run the following command in your terminal or command prompt:
serverless create --template aws-python3 --path backend
We're using
backend
here to create a new project in a folder called backend.
Then navigate to the new backend
folder by running:
cd backend
Inside the folder, you'll find two files:
- handler.py
: This is the file that contains your AWS Lambda function code
- serverless.yml
: This is the configuration file for your Serverless project, which defines the resources, function, and events in your application
6.3 Configure the serverless.yml file
In this section we'll walk through the serverless.yml
file configuration, explaining the purpose of each part.
Open the serverless.yml
file in your favorite text editor or IDE. You'll need to customize this file to define your chatbot's backend resources, function, and events.
Replace the current code with the following:
service: your-service-name
provider:
name: aws
runtime: python3.9
stage: dev
region: us-east-1
plugins:
- serverless-python-requirements
functions:
chatgpt-audio-chatbot:
handler: handler.handler
timeout: 30
events:
- http:
path: get-answer
method: post
cors: true
Let's break down and explain the purpose of each part.
1. Service name
service: your-service-name
This line defines the name of your service, which is used by the Serverless Framework to group related resources and functions. In this case, make sure to replace your-service-name
with your own name.
2. Provider configuration
provider:
name: aws
runtime: python3.9
stage: dev
region: us-east-1
This section specifies the cloud provider, in our case AWS, and sets up some basic configurations:
- name
: The cloud provider for your service aws
- runtime
: The runtime for your Lambda function python3.9
- stage
: The stage of your service deployment dev
- you can use different stages for different environments (e.g. development, staging, production)
- region
: The AWS region where your service will be deployed us-east-1
. Make sure to select a region that supports the required services and is closest to your users for lower latency
3. Plugins
plugins:
- serverless-python-requirements
This section lists any Serverless Framework plugins you want to use in your project. In this case, we're using the serverless-python-requirements
plugin to automatically package and deploy any Python dependencies your Lambda function requires.
4. Functions
functions:
chatgpt-audio-chatbot:
handler: handler.handler
timeout: 30
events:
- http:
path: get-answer
method: post
cors: true
This section defines the Lambda functions in your service:
- chatgpt-audio-chatbot
: The name of the Lambda function
- handler
: The reference to the function within your handler.py
file handler.handler
- this tells the Serverless Framework to use the handler
function defined in the handler.py
file
- timeout
: The maximum time your Lambda function is allowed to run before it's terminated, in seconds. We've set it to 30 seconds.
- events
: The events that trigger your Lambda function. In this case, we've set up an HTTP event, which is triggered by a POST request to the /get-answer
endpoint. The cors: true
setting enables CORS (Cross-Origin Resource Sharing) for this endpoint, allowing requests from different origins (e.g. your frontend application)
Now that you have a better understanding of the serverless.yml
file, you can customize it to suit the future needs of your chatbot's backend.
In the next section, we'll walk through implementing the Lambda function in the handler.py
file.
7. Create the python backend
7.1 Import necessary libraries
Open up the handler.py
file, delete all the prewritten code and let's start by importing the necessary libraries:
import json
import base64
import io
from openai import OpenAI
import requests
from requests.exceptions import Timeout
We'll need json
to parse and handle JSON data from incoming requests and to generate JSON responses.
base64
will be used to encode and decode the audio data sent and received in requests.
The io
library is necessary for handling the in-memory file-like objects used in the audio transcription process.
The openai
library enables us to interact with the OpenAI API for transcribing and generating text, while requests
will be used to make HTTP requests to the Eleven Labs API for converting text to speech.
7.2 Add your OpenAI API key
Next, let's add our OpenAI API key. Here're the steps for getting your OpenAI API key if you don't already have it.
Go to https://beta.openai.com/, log in and click on your avatar and View API keys:
Then create a new secret key and save it for the request:
Remember that you'll only be able to reveal the secret key once, so make sure to save it somewhere for the next step.
Then add your API key to the handler.py
file below the library imports:
openai.api_key = "sk-YOUR_API_KEY"
And initiate the OpenAI
libraary with your API key:
# Initialize OpenAI
openai_client = OpenAI(
api_key = api_key
)
7.3 Create function to transcribe audio
For the user audio, we'll need to create a function that takes the audio data as input and returns the transcribed text using the OpenAI API. This function will be called transcribe_audio
and will accept a single argument, audio_data
.
Add the function to handler.py
:
def transcribe_audio(audio_data):
# Convert the audio data into a file-like object using io.BytesIO
with io.BytesIO(audio_data) as audio_file:
audio_file.name = "audio.mp3" # Add a name attribute to the BytesIO object
# Use the OpenAI API to transcribe the audio, specifying the model, file, and language
response = openai_client.audio.transcriptions.create(model="whisper-1", file=audio_file, language="en")
# Extract the transcribed text from the response
transcription = response.text
return transcription
In this function, we first create a file-like object using io.BytesIO
by passing in the audio_data
. We then add a name attribute to the BytesIO
object to indicate that it is an MP3 file.
Next, we call the openai_client.audio.transcriptions.create
method, providing the model whisper-2
, the audio_file
object, and specifying the language as en
(English).
The API call returns a response containing the transcribed text, which we extract and return from the function.
7.4 Create function to generate a text reply
Once we have the audio transcribed, we'll need to create a function that calls the OpenAI API to generate a chat completion based on the user message.
Let's create the function generate_chat_completion
to achieve this:
def generate_chat_completion(messages):
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
max_tokens=100, # Optional, change to desired value
)
return response.choices[0].message.content
Now, let's break down the function:
1. The generate_chat_completion
function takes a single argument, messages
, which is a list of message objects from the frontend.
2. We call the openai_client.chat.completions.create
method to generate a chat completion using the gpt-3.5-turbo
model. We pass the messages
list as an argument to the method.
The messages
are formatted in the frontend and should be a list of dictionaries, each containing a role
, either system
, user
, or assistant
, and content
, which is the message text. We've also added the max_tokens
parameter and set it to 100.
When generating chat completions using GPT, you might want to limit the length of the responses to prevent excessively long answers. You can do this by setting the
max_tokens
parameter when making the API call. In ourgenerate_chat_completion
function, we've added themax_tokens
parameter and set it to 100.By setting
max_tokens
to 100, we limit the response to a maximum of 100 tokens. You can adjust this value according to your requirements.Keep in mind that if you set it too low, the generated text might be cut off and not make sense to users. Experiment with different values to find the best balance between response length and usability.
3. The API call returns a response that contains a list of choices, with each choice representing a possible chat completion. In our case, we simply select the first choice response.choices[0]
.
4. Finally, we extract the content of the message from the first choice using response.choices[0].message.content
.
With this function in place, we can not generate a text reply based on the transcribed user audio and any other messages provided in the messages
list.
7.5 Create function to generate audio from text
Now that we have the text reply generated by our chatbot, we might want to convert it back to audio if the flag isAudioResponse
is true
. For this, we'll create a function called generate_audio
that uses the ElevenLabs API to synthesize speech from the generated text.
ElevenLabs has a generous free tier with API access - just remember to add an attribution to elevenlabs.io when on the free tier:
Screenshot from Match 2023
Start by creating a free ElevenLabs account, if you don't already have one. Visit https://beta.elevenlabs.io/ and click Sign up:
Then click on the avatar in the upper right corner and click Profile:
Copy your API key and have it available for the next step when we're calling the ElevenLabs API:
The last step is to get a voice id for the API call. Go back to your dashboard, click Resources and then API:
Click to expand the documentation for text-to-speech:
Here you'll find a voice id we'll use when synthesizing the audio in our backend, copy and save the voice id for the next steps:
Let's finally create the function to synthesize speech from the generated text:
def generate_audio(generated_text):
# API key
api_key = "YOUR_API_KEY"
# Voice id
voice_id = "21m00Tcm4TlvDq8ikWAM"
# Voice params
data = {
"text": generated_text,
"voice_settings": {
"stability": 0,
"similarity_boost": 0
}
}
# Call endpoint
url = f'https://api.elevenlabs.io/v1/text-to-speech/{voice_id}?api_key={api_key}'
headers = {
'accept': 'audio/mpeg',
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=data)
# Bytes type is not JSON serializable
# Convert to a Base64 string
return base64.b64encode(response.content).decode('utf-8')
Let's break down this function step by step:
1. We define the API key and voice ID as variables. Replace YOUR_API_KEY
with your actual ElevenLabs API key we just generated:
api_key = "YOUR_API_KEY"
voice_id = "21m00Tcm4TlvDq8ikWAM"
2. We create a dictionary called data
that contains the generated text and voice settings. The text
key contains the text that we want to convert to speech. The voice_settings
key is a dictionary containing options for controlling the stability and similarity of the generated voice:
data = {
"text": generated_text,
"voice_settings": {
"stability": 0,
"similarity_boost": 0
}
}
3. We define the API endpoint URL using the voice ID and the API key. The URL includes the base endpoint, https://api.elevenlabs.io/v1/text-to-speech/
followed by the voice ID and the API key as a query parameter:
url = f'https://api.elevenlabs.io/v1/text-to-speech/{voice_id}?api_key={api_key}'
4. We set up the HTTP headers for our API request. The accept
header indicates that we expect the response to be in the audio/mpeg
format, while the Content-Type
header specifies that we will send JSON data in our request:
headers = {
'accept': 'audio/mpeg',
'Content-Type': 'application/json'
}
5. We then use the request.post
method to make a POST request to the API endpoint, passing the headers and JSON data as arguments. The API call returns a response containing the synthesized audio data:
response = requests.post(url, headers=headers, json=data, timeout=15)
Try block for timeout In some cases, the ElevenLabs API request time is long, causing the API Gateway to time out while waiting for a response. To handle this, we've added a timeout of 15 seconds to the
generate_audio
function. This ensures that our application does not hang indefinitely while waiting for a response from the API, and provides a more predictable user experience.If the API does not respond within 15 seconds, the request will be terminated and return
None
. We added atry
block around the request and catch therequests.exceptions.Timeout
exception.
6. Since the audio data is in bytes format, which is not JSON serializable, we need to convert it to a Base64
string. We use the base64.b64encode
method to do this and then decode the result to a UTF-8 string using the decode
method:
base64.b64encode(response.content).decode('utf-8')
7. Finally, we return the Base64-encoded audio data as the output of the function.
With this generate_audio
function, we can now convert the text reply generated by our chatbot back into an audio format that can be played by the user.
7.6 Create the handler function to tie everything together
Finally, we need to create the main handler function that will be triggered by the API Gateway event. This function will tie together all the other functions we've created, allowing us to process the incoming request, transcribe audio, generate chat completions, and create audio responses.
Add the handler
function to your handler.py
file:
def handler(event, context):
try:
body = json.loads(event["body"])
if 'audio' in body:
audio_base64 = body["audio"]
audio_data = base64.b64decode(audio_base64.split(",")[-1])
transcription = transcribe_audio(audio_data)
message_objects = body['messages'] + [{"role": "user", "content": transcription}]
elif 'text' in body:
transcription = body['text']
message_objects = body['messages']
else:
raise ValueError("Invalid request format. Either 'audio' or 'text' key must be provided.")
generated_text = generate_chat_completion(message_objects)
# Check if audio response
is_audio_response = body.get('isAudioResponse', False)
if is_audio_response:
generated_audio = generate_audio(generated_text)
else:
generated_audio = None
response = {
"statusCode": 200,
"headers": {"Access-Control-Allow-Origin": "*"},
"body": json.dumps(
{"transcription": transcription, "generated_text": generated_text, "generated_audio": generated_audio}),
}
return response
except ValueError as ve:
import traceback
print(traceback.format_exc())
print(f"ValueError: {str(ve)}")
response = {
"statusCode": 400,
"body": json.dumps({"message": str(ve)}),
}
return response
except Exception as e:
import traceback
print(traceback.format_exc())
print(f"Error: {str(e)}")
response = {
"statusCode": 500,
"body": json.dumps({"message": "An error occurred while processing the request."}),
}
return response
Let's break down this handler
function step by step:
1. We start by defining the handler function with two arguments: event
and context
. The event
object contains the data from the API Gateway event, and context
contains runtime information:
def handler(event, context):
2. We then extract the body
from the event
object by loading it as a JSON object:
body = json.loads(event["body"])
3. We then check if the body
contains an audio
key. If it does, we decode the base64-encoded audio data and transcribe it using the transcribe_audio
function. We create a message_objects
list by combining the existing messages from the frontend data with the transcribed message:
if 'audio' in body:
audio_base64 = body["audio"]
audio_data = base64.b64decode(audio_base64.split(",")[-1])
transcription = transcribe_audio(audio_data)
message_objects = body['messages'] + [{"role": "user", "content": transcription}]
4. If the body
contains a text
key instead, we simply use the text provided and create the message_objects
list from the frontend data:
elif 'text' in body:
transcription = body['text']
message_objects = body['messages']
5. If neither audio
nor text
keys are present, we raise a ValueError
to indicate that the request format is invalid:
else:
raise ValueError("Invalid request format. Either 'audio' or 'text' key must be provided.")
6. We then call the generate_chat_completion
function, passing the message_objects
list as an argument. This returns the generated text response from our Chatbot:
generated_text = generate_chat_completion(message_objects)
7. We check if the body
contains an isAudioResponse
key and use its value to determine if we should generate an audio response from the generated text:
is_audio_response = body.get('isAudioResponse', False)
8. If an audio response is requested from the frontend, we call the generate_audio
function to convert the generated text back to audio, If not, we set generated_audio
to None
:
if is_audio_response:
generated_audio = generate_audio(generated_text)
else:
generated_audio = None
9. We create a response
dictionary with the following keys:
- statusCode
: The HTTP status code for the response. We set it to 200, indicating a successful operation.
- headers
: The HTTP headers to include the response. We set the Access-Control-Allow-Origin
header to *
to enable cross-origin requests.
- body
: The response body, which we serialize as a JSON object. The response body contains the following keys:
transcription
: The transcribed text from the user's audio input
generated_text
: The generated text response from the chatbot
generated_audio
: The generated audio response if requested, encoded as a base64 string:
response = {
"statusCode": 200,
"headers": {"Access-Control-Allow-Origin": "*"},
"body": json.dumps(
{"transcription": transcription, "generated_text": generated_text, "generated_audio": generated_audio}),
}
10. We return the response
dictionary:
return response
11. If a ValueError
occurs, e.g., due to an invalid request format, we catch the exception, print the traceback, and return a 400 status code along with an error message:
except ValueError as ve:
import traceback
print(traceback.format_exc())
print(f"ValueError: {str(ve)}")
response = {
"statusCode": 400,
"body": json.dumps({"message": str(ve)}),
}
return response
12. If any other exception occurs, we catch the exception, print the traceback, and return a 500 status code along with a generic error message:
except Exception as e:
import traceback
print(traceback.format_exc())
print(f"Error: {str(e)}")
response = {
"statusCode": 500,
"body": json.dumps({"message": "An error occurred while processing the request."}),
}
return response
With the handler
function complete, we now have a fully functional backend for our chatbot that can handle text and audio input, generate chat completions using OpenAI and return text or audio responses as needed.
8. Deploying the backend to AWS
Now that we have our chatbot backend implemented in the handler.py
file, it's time to deploy it to AWS using Serverless Framework, In this section, we'll go through the deployment process step by step.
8.1 Ensure AWS credentials are configured
Before deploying, ensure that you have properly set up your AWS credentials on your local machine. If you haven't done this yet, refer to section 6.2 for a detailed guide on setting up your AWS credentials.
8.2 Install dependencies
Before deploying the backend, we need to install the required Python packages. In your backend
folder, create a requirements.txt
file and add the following dependencies:
openai
requests
8.3 Install and configure the serverless-python-requirements plugin
Before deploying the Serverless project, you need to ensure that you have the serverless-python-requirements
plugin installed and configured. This plugin is essential for handling your Python dependencies and packaging them with your Lambda function.
To install the plugin, run the following command in your project directory:
npm install --save serverless-python-requirements
This command will add the plugin to your project package.json
file and install it in the node_modules
folder.
These are the Python packages needed for our backend implementation. The Serverless Framework will automatically package these dependencies and include them in the deployment.
8.4 Deploy the backend
Now that we have our AWS credentials configured and our dependencies installed, it's time to deploy the backend. Open a terminal, navigate to your backend
folder located in your project folder, and run the following command:
serverless deploy
This command will package and deploy tour Serverless service to AWS Lambda. The deployment process might take a few minutes. Once the deployment is completed, you'll see output similar to this:
Service Information
service: chatgpt-audio-chatbot
stage: dev
region: us-east-1
stack: chatgpt-audio-chatbot-dev
endpoints:
POST - https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/dev/get-answer
functions:
chatgpt-audio-chatbot: chatgpt-audio-chatbot-dev-chatgpt-audio-chatbot
Take note of the endpoints
section, as it contains the API Gateway URL for your deployed Lambda function. You'll need this URL in the next section when we'll make requests to your chatbot backend from the frontend.
8.5 Locating the deployed Lambda function in the AWS console
Once your backend is successfully deployed, you may want to explore and manage your Lambda function using the AWS Console. In this section, we'll guide you through the process of finding your deployed Lambda function in the AWS Console.
1. Sign in to your AWS Management Console: https://aws.amazon.com/console/
2. Under the "Services" menu, navigate to "Lambda" or use the search bar to find and select "Lambda" to open the AWS Lambda Console.
3. In the AWS Lambda Console, you'll see a list of all the Lambda functions deployed in the selected region. The default function name will be in the format service-stage-function, where service is the service name defined in your serverless.yml file, stage is the stage you deployed to (e.g., dev), and function is the function name you defined in the same file.
For example, if your serverless.yml has the following configurations:
service: chatgpt-audio-chatbot
...
functions:
chatgpt-audio-chatbot:
handler: handler.handler
The Lambda function will have a name like chatgpt-audio-chatbot-dev-chatgpt-audio-chatbot
.
4. Click on the Lambda function name in the list to view its details, configuration, and monitoring information. On the Lambda function details page, you can:
- Edit the function code in the inline code editor (for smaller functions), or download the deployment package to make changes offline.
- Modify environment variables, memory, timeout, and other settings.
- Add triggers, layers, or destinations.
- View monitoring data, such as invocation count, duration, and error rate in the Monitoring tab.
- Access CloudWatch Logs to view and search the function's logs in the Monitoring tab, by clicking on View logs in CloudWatch
5. Additionally, you can navigate to the API Gateway console to view and manage the API Gateway that's integrated with your Lambda function:
- In the AWS Management Console, search for API Gateway under the Services menu or use the search bar.
- Select the API Gateway that corresponds to your serverless.yml configuration (e.g., chatgpt-audio-chatbot-dev if your service name is chatgpt-audio-chatbot and the stage is dev).
- In the API Gateway Console, you can view and manage resources, methods, stages, and other settings for your API. You can also test the API endpoints directly from the console.
By following these steps, you can locate, manage, and monitor your deployed Lambda function and other AWS resources from the AWS Management Console. This allows you to better understand your application's performance, troubleshoot any issues, and make any necessary updates to the backend as needed.
8.6 Test the deployed backend
To ensure that your backend is working correctly, you can use a tool like Postman or curl to send a test request to the API Gateway URL. Replace https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com with your own API Gateway URL you received when you deployed the backend:
For a text-based request:
curl -X POST https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/dev/get-answer \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "system", "content": "You are a helpful assistant."}],
"text": "What is the capital of France?",
"isAudioResponse": false
}'
For an audio-based request, replace your_base64_encoded_audio_string with an actual Base64 encoded audio string:
curl -X POST https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/dev/get-answer \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "system", "content": "You are a helpful assistant."}],
"audio": "your_base64_encoded_audio_string",
"isAudioResponse": false
}'
You should receive a response containing the transcription of the user's input, the generated text from the chatbot, and (optionally) the generated audio if isAudioResponse is set to true.
{
"transcription": "What is the capital of France?",
"generated_text": "The capital of France is Paris.",
"generated_audio": null
}
If you receive an error, double-check your request payload and ensure that your Lambda function has the correct permissions and environment variables set.
9. Update the frontend
Now that your backend is deployed and working correctly, let's update the frontend application to use the API Gateway URL. We'll leverage AWS Amplify to configure the API call and make it easy to interact with our backend.
First, open the App.js
file in your frontend project. Import Amplify
from aws-amplify
:
import {
Amplify, // Add to imports
API
} from "aws-amplify";
Just before the function App()
, add the Amplify configuration, including the API endpoint you received when you deployed the backend:
Amplify.configure({
// OPTIONAL - if your API requires authentication
Auth: {
mandatorySignIn: false,
},
API: {
endpoints: [
{
name: "api",
endpoint: "https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/dev"
}
]
}
});
Make sure to replace xxxxxxxxxx
with the actual endpoint from the backend deploy.
With your backend deployed and your frontend updated, your ChatGPT Audio Chatbot is now ready to use!
Let's try it out, here's how it works:
10. Redeploying after changes
If you make any changes to your backend code or serverless.yml configuration, you can redeploy your service by running serverless deploy again. The Serverless Framework will update your AWS resources accordingly.
Remember to test your backend after each deployment to ensure everything is working as expected.
That's it! You have successfully created and deployed a ChatGPT Audio Chatbot using OpenAI, AWS Lambda, and the Serverless Framework. Your chatbot is now ready to receive and respond to both text and audio-based requests.
The source code
Do you want the full source code? This tutorial is quite extensive, and following along step-by-step may be time-consuming.
Visit this page to download the entire source code, you'll get instant access to the files, which you can use as a reference or as a starting point for your own voice-powered ChatGPT bot project.
Improvements
Protecting the Lambda API Endpoint Currently, our Lambda function is openly accessible, which can lead to potential misuse or abuse. To secure the API endpoint, you can use Amazon API Gateway's built-in authentication and authorization mechanisms. One such mechanism is Amazon Cognito, which provides user sign-up and sign-in functionality, as well as identity management.
By integrating Amazon Cognito with your API Gateway, you can ensure that only authenticated users have access to your chatbot API. This not only secures your API but also enables you to track and manage user access, providing a more robust and secure experience.
In summary, leveraging Amazon Cognito for authentication is an excellent way to protect your Lambda API Endpoint and enhance the security of your chatbot application.
Error Handling The chatbot application could benefit from more comprehensive error handling. This would involve checking for error responses from both the text-to-speech API, the speech-to-text API and the Lambda function and gracefully displaying relevant error messages to the user. This would help users understand any issues encountered during their interaction with the chatbot.
Saving Chat History to a Database Currently, the chat history between the user and the chatbot is stored in the application's state, which means that the messages disappear when the page is refreshed. To preserve the chat history, you can save the conversation to a database. This can be achieved using a variety of database solutions, such as Amazon DynamoDB or MongoDB.
Storing chat history in a database provides additional benefits, such as the ability to analyze user interactions for further improvements, track user satisfaction, and monitor the chatbot's performance.
By implementing these improvements, you can enhance the security, user experience, and functionality of your chatbot application, making it more robust and reliable for real-world use.
Questions
Get in Touch for Assistance or Questions Do you need help implementing the ChatGPT chatbot, or have any other questions related to this tutorial? I'm more than happy to help. Don't hesitate to reach out by sending an email to norah@braine.ai
Alternatively, feel free to shoot me a DM on Twitter @norahsakal.
I look forward to hearing from you and assisting with your ChatGPT chatbot journey!