Similar in terms of structure of information. Whil...
# 🌎general
Similar in terms of structure of information. While the information of course are different. For example the financial report, usually the structure report will be similar. So botpress will be great for knowledge bases with different structure of information?
For example, if you're uploading monthly financial reports for June, July, August, etc, GPT isn't advanced enough to answer questions about trends or do forecasting.
A better way to do that would be to upload some kind of analysis document to the knowledge base.
We've had users build knowledge bases with 50+ documents before, so you're probably not running into any limitations there
What I have tried was asking a question about revenue per month which is actually provided inside each document. However, the bot replied for example June is xxx, July is xxx, and August no information provided. While actually the documents are provided for June July August. Therefore I am wondering how is bot extracting the information from document knowledge base.
Aaaah, that might explain it. Here's a brief overview of how the KB works for documents. 1. When a document gets uploaded, it is split into chunks that are about 100 - 500 words long 2. These chunks are stored in a database along with all the other chunks from all text and document KB sources 3. When the user asks a question, the AI system retrieves a list of 5 - 10 most relevant chunks 4. GPT stitches an answer together using these top chunks So, if the top chunks are full of a lot of repeated data about June and July monthly numbers, August's numbers might not make the cut. Or maybe there's info about August's numbers that doesn't include the numbers. One way to counter this might be to add a knowledge source that contains this info in a condensed way, like a table or a short analysis. Even something like:
Copy code
Here are the month revenue numbers for Q2 2023:
June: $10,000
July: $11,000
August: $12,000
Total" $33,000
could be enough to get you the right answers.
Noted. Thank you for the explanation. Then I think the interesting part will be on step 3 when AI retrieves a list of 5-10 most relevant chunks. So the chunks are indexing using openAI embedding model as well or any other method? Cause I have also tried to use a document with around 100 pages and the bot failed to retrieve the information from the document
Thank you for the suggestion as well. It seems to be working if we provide condensed information. However, the idea of using the knowledge base at the beginning I think is to provide additional context to bot by leveraging the power of LLM so we don't have to predict the question from the user. Please enlighten me if my understanding is wrong
Yes we are using OpenAI embeddings to create the index. @acceptable-gold-88171 have you heard of anyone successfully using a single, 100 page PDF as a knowledge base source? That might be pushing the limits of our system.
Also, LLMs are not 100% magical (yet), and they are still lacking at analytical and reasoning tasks. They are, however, excellent at retrieval and summarization. Perhaps you could try a two-step approach where you use GPT to summarize chunks of that 100 page document, then upload those sumaries to your knowledge base.
I've tried 700 page texts, and it works well enough. The problem is formatting / context for search. I would convert the pdf to a txt using an online tool (Google is your friend here 🙂 ), then just add a bunch of categories, or split it into parts and add descriptions to the KBS, otherwise you will get the issue you described above. Also, I may be wrong, but from my own observations I believe you can get better results when splitting your KBs, because it will attempt to search every KB individually and converge on an answer after.
Well noted. Thank you so much