The Real Python Podcast

Leveraging Documents and Data to Create a Custom LLM Chatbot

04.05.2024 - By Real PythonPlay

Download our free app to listen on your phone

Download on the App StoreGet it on Google Play

How do you customize a LLM chatbot to address a collection of documents and data? What tools and techniques can you use to build embeddings into a vector database? This week on the show, Calvin Hendryx-Parker is back to discuss developing an AI-powered, Large Language Model-driven chat interface.

Calvin is the co-founder and CTO of Six Feet Up, a Python and AI consultancy. He shares a recent project for a family-owned seed company that wanted to build a tool for customers to access years of farm research. These documents were stored as brochure-style PDFs and spanned 50 years.

We discuss several of the tools used to augment a LLM. Calvin covers working with LangChain and vectorizing data with ChromaDB. We talk about the obstacles and limitations of capturing documentation.

Calvin also shares a smaller project that you can try out yourself. It takes the information from a conference website and creates a chatbot using Django and Python prompt-toolkit.

This episode is sponsored by Mailtrap.

Course Spotlight: Command Line Interfaces in Python

Command line arguments are the key to converting your programs into useful and enticing tools that are ready to be used in the terminal of your operating system. In this course, you’ll learn their origins, standards, and basics, and how to implement them in your program.

Topics:

00:00:00 – Introduction

00:02:21 – Background on the project

00:03:51 – Complexity of adding documents

00:09:01 – Retrieval-augmented generation and providing links

00:13:46 – Updating information and larger conversation context

00:18:08 – Sponsor: Mailtrap

00:18:43 – Working with context

00:21:02 – Temperature adjustment

00:22:07 – Rally Conference Chatbot Project

00:26:20 – Vectorization using ChromaDB

00:32:49 – Employing Python prompt-toolkit

00:35:07 – Learning libraries on the fly

00:37:38 – Video Course Spotlight

00:39:00 – Problems with tables in documents

00:42:30 – Everything looks like a chat box

00:44:26 – Finding the right fit for a client and customer

00:49:05 – What are questions you ask a new client now?

00:51:54 – Canada Air anecdote

00:56:20 – How do you stay up to date on these topics?

01:01:03 – What are you excited about in the world of Python?

01:03:22 – What do you want to learn next?

01:04:58 – How can people follow your work online?

01:05:31 – IndyPy

01:07:13 – Thanks and goodbye

Show Links:

Transforming Agricultural Data with AI — Six Feet Up

Build ChatGPT-like Apps with AI — Six Feet Up

Innovate with AI: Build ChatGPT-like Apps - YouTube

What is retrieval-augmented generation? - IBM Research Blog

rally-llm-presentation - sixfeetup - GitHub

Python Prompt Toolkit 3.0 — Documentation

Chroma - the AI-native open-source embedding database

Embeddings and Vector Databases With ChromaDB – Real Python

LangChain

Build an LLM RAG Chatbot With LangChain – Real Python

Air Canada must pay after chatbot lies to grieving passenger - The Register

I’d Buy That for a Dollar: Chevy Dealership’s AI Chatbot Goes Rogue

Omnivore

TLDR AI - Get smarter about AI in 5 minutes

Tech Brew

Simon Willison’s Weblog

llm: Access large language models from the command-line - simonw - GitHub

PyCon US 2024

Syntorial: The Ultimate Synthesizer Tutorial

Blog — Six Feet Up

Calvin Hendryx-Parker - LinkedIn

Eclipse Insights: How AI is Transforming Solar Astronomy - YouTube

Level up your Python skills with our expert-led courses:

Sneaky REST APIs With Django Ninja

How to Work With a PDF in Python

Command Line Interfaces in Python

Support the podcast & join our community of Pythonistas

More episodes from The Real Python Podcast