Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1
By Fireworks AI Team|8/14/2024
DeepSeek V3, a state-of-the-art open model, is now available. Try it now!
By Fireworks AI Team|8/14/2024
Large Language Models have revolutionized how we retrieve information or build search systems. Retrieval-augmented generation (RAG) methodology has become a common way to access or extract information.
This guide teaches you how to build a Retrieval-Augmented Generation application using SurrealDB, Fireworks, FastAPI, and Astro. By the end of this guide, you will be able to update the chatbot’s knowledge visually and obtain the latest and personalized responses to your queries.
You'll need the following:
The following technologies are used in creating our RAG application:
This is a high-level architecture of how data is flowing and operations that take place 👇🏻
💡 You can find the code for the application in the Github Repo.
You can find various methods to install and run the SurrealDB server in the documentation. Let's opt for installing SurrealDB using its dedicated install script for our scenario. In your terminal window, execute the following command:
The above command attempts to install the latest version of SurrealDB (per your platform and CPU type) into the /usr/local/bin
folder in your system.
Once that is done, execute the following command in your terminal window:
The above command does the following:
0.0.0.0:4304
network address.root
.mydatabase.db
to persist data on your filesystem.Model inference requests to the Fireworks API require an API Key. To generate this API key, log in to your Fireworks account and navigate to API Keys. Enter a name for your API key and click the Create Key button to generate a new API key. Copy and securely store this token for later use as FIREWORKS_API_KEY
environment variable.
Locally, set and export the FIREWORKS_API_KEY
environment variable by executing the following command:
First, let's start by creating a new project. You can create a new directory by executing the following command in your terminal window:
Next, you can install the required dependencies by executing the following command in your terminal window:
The above command installs the required libraries to run ASGI Server, FastAPI, Fireworks AI, SurrealDB and LangChain in your Python project.
Next, create a file main.py
with the following code:
The above code imports the following:
os
module to use the environment variable you’ve set earlier.List
to denote a list of elements of specific type.BaseModel
class to define models of the request body FastAPI endpoints.StreamingResponse
class to generate streaming responses from FastAPI endpoints.CORSMiddleware
FastAPI middleware to enable Cross Origin Resource Sharing of FastAPI endpoints.fireworks.client
SDK for conveniently accessing Fireworks supported LLMs.SurrealDBStore
class by LangChain to use SurrealDB as vector store.FireworksEmbeddings
class via LangChain Fireworks integration to use Nomic AI Embeddings Model.To create the data types of request body in your FastAPI endpoints, append the following code in main.py file:
The above code defines three Pydantic models:
messages
.role
and content
.Message
model.To set the Fireworks API key used by Fireworks AI module internally, append the following code in main.py file:
The above code uses the os
module to load the environment variable FIREWORKS_API_KEY
as Firework’s API Key.
To use FireworksEmbeddings
class to create an embeddings generator using the nomic-ai/nomic-embed-text-v1.5
, append the following code in main.py file:
To define the SurrealDB vector store configuration, append the following code in main.py file:
The above code uses the following values to establish a SurrealDB Vector Store with LangChain:
ws://localhost:4304/rpc
as the database URL to establish a WebSocket connection with SurrealDB. Using a WebSocket connection allows to send and receive messages from SurrealDB using the WebSocket API.root
as both the username and password of the SurrealDB instance.vectors
as the collection name of the vector store to and from which the relevant vectors will be inserted and queried from.embeddings
generator as the embedding function.To initialize a FastAPI application, append the following code in main.py file:
The code above creates a FastAPI instance and uses the CORSMIddleware
middleware to enable Cross Origin requests. This allows your frontend to successfully POST to the RAG application endpoints to fetch responses to the user query, regardless of the port it is running on.
To update application’s knowledge in realtime by generating vector embeddings and inserting them into SurrealDB, you’ll create an /update
endpoint in your FastAPI application. Append the following code in main.py file:
update(messages: LearningMessages)
method -
messages
containing comma (,) separated messages to be inserted in your SurrealDB vector store.metadata
list, each item being length of each message received as input.ids
list, each item being a randomly generated id for each message received as input.embeddings
generator passed as the embeddings function, it generates the vector embedding of each message. Alongwith each message’s metadata, it inserts the vector embedding into the SurrealDB vector store.To generate personalized responses that uses the application’s existing knowledge, you’ll create an /chat
endpoint in your FastAPI application. Append the following code in main.py file:
chat(messages: Messages)
method -
Message
model as messages
.Message
, which represents a user query.Message
model, representing role of the system and it’s content as the system prompt created.yield_content
function.The yield_content
function loops over each Document (received as the similar vector with it’s metadata), and streams the content
value of it as part of the API response.
With all that done, here’s how our main.py
will finally look like containing both the endpoints:
Execute the following command in another terminal window:
💡 Use Python virtual environments, to avoid conflicts with other packages. Simply run
./venv/bin/uvicorn main:app --reload
to make a clear distinction between global and local environments.
The app should be running on localhost:8000. Let’s keep it running while we create an user interface to invoke the endpoints to create responses to user queries.
Let’s get started by creating a new Astro project. Open your terminal and run the following command:
npm create astro
is the recommended way to scaffold an Astro project quickly.
When prompted, choose the following:
Empty
when prompted on how to start the new project.Yes
when prompted whether to write Typescript.Strict
when prompted how strict Typescript should be.Yes
when prompted to whether install dependencies.Yes
when prompted to whether initialize a git repository.Once that’s done, you can move into the project directory and start the app:
The app should be running on localhost:4321. Let's close the development server as we move on to integrate TailwindCSS into the application.
For styling the app, you will be using Tailwind CSS. Install and set up Tailwind CSS at the root of our project's directory by running:
When prompted, choose:
Yes
when prompted to install the Tailwind dependencies.Yes
when prompted to generate a minimal tailwind.config.mjs
file.Yes
when prompted to make changes to Astro configuration file.With choices as above, the command finishes integrating TailwindCSS into your Astro project. It installed the following dependency:
tailwindcss
: TailwindCSS as a package to scan your project files to generate corresponding styles.@astrojs/tailwind
: The adapter that brings Tailwind's utility CSS classes to every .astro
file and framework component in your project.To create reactive interfaces quickly, let’s move onto integrating React in your application.
To prototype the reactive user interface quickly, you are gonna use React as the library with Astro. In your terminal window, execute the following command:
npx
allows us to execute npm packages binaries without having to first install it globally.
When prompted, choose the following:
Yes
when prompted whether to install the React dependencies.Yes
when prompted whether to make changes to Astro configuration file.Yes
when prompted whether to make changes to tsconfig.json
file.To create conversation user interface easily, let’s move onto installing an AI SDK in your application.
In your terminal window, run the command below to install the necessary library for building the conversation user interface:
The above command installs the following:
ai
library to build AI-powered streaming text and chat UIs.axios
library to make HTTP requests.Inside src
directory, create a Chat.jsx
file with the following code:
chat.jsx
does the following:
useChat
hook by ai
SDK to manage the conversation between user and the application. It takes care of saving the entire conversation (on the client-side) and using them as the request body when it calls the user defined api
endpoint to fetch the response from chatbot.<input>
element to allow users to enter their query. It then loops over all the messages in the entire conversation, including the latest response to the user query.Now, let’s create a component that will allow the user to supply some strings to the application to take into consideration before it answers any of the user query.
Inside src
directory, create a Update.jsx
file with the following code:
Update.jsx
-
axios
library and useState
hook by React.<textarea>
element to allow users to enter multiple strings, wherein each string is represented between comma(s).http://localhost:8000/update
endpoint.src/pages/index.astro
file:The changes above being with importing both the Chat and Update React components. Then, it uses Astro's client:load
directive to make sure that both the components are loaded and hydrated immediately on the page.
Run your Astro application by executing the following command in another terminal window:
The app should be running on localhost:4321.
Congratulations, you created a Retrieval-Augmented Generation application using SurrealDB and Fireworks AI. With SurrealDB’s vector store, you are able to insert and update vector embeddings on the fly over WebSockets, and perform similarity search to user queries using vector embeddings generated internally for you.
Further, using Fireworks AI, you are able to invoke Llama 3.1 70B Chat model with system context and generate personalized responses to user queries.