Kb charged 4,400 tokens for 100 kilobyte knowledge...
# 🤝help
g
I have 4 optimised small text documents in Ukrainian weighing only 100 kilobytes. But I pay for query Knowladge base 4000 tokens, it is more than people with 80 megabytes database! Here is my documents
@fresh-fireman-491 You and I talked in another ticket with translation, but there I was able to solve the problem by disabling the personality agent and recreated the ticket with the other problem we discussed there.
I also had a version of this database in English, and it was not 100 kilobytes, but 500. But in that case I paid only 2400 tokens. Which is also not a small amount, but in any case the difference with what I pay now is huge!
f
What is the prompt What is the output What is the total amount of tokens used with that prompt
g
User prompt, yes?
There is no matter what prompt it is. In any case I've charged for 4,400 tokens
It can be simple prompt about prices, services with different output, but in any case I pay for 4k + tokens
For general understanding, I could have sent you a bot.
m
hi, i'll follow this conversation too since i have no answer here https://discord.com/channels/1108396290624213082/1202662622005305355
f
The amount of tokens in the prompt and output does matter.
Can try and run it again and write down the prompt, output and amount of tokens used and send it here.
g
Ok, I’ll do it later. Not at home😊
Hi, I'm here
Litttle answer in kb
Larger answer
No answer in kb...
f
Retrieval-Augmented Generation (RAG) uses the powers of semantic search with the generative capabilities of large language models (LLMs) which in Botpress are GPT 3.5 or GPT 4. There are several steps in this process, each contributing to the overall token usage: First of RAG performs a semantic search to identify the most relevant information to the query from your KB. This will yield a set of documents or chunks. I am unsure of the amount of chunks and the chunk size and overlap but lets just say for example it retrieves four chunks, each containing 1,000 tokens. These chunks are then fed to an LLM, in Botpress its either GPT 3.5 or GPT 4. The model you chose uses these chunks as context to understand the query better and generate a precise answer. If we continue with the example above, the processing of these four chunks alone accounts for 4,000 tokens. Both the query and the response from the LLM consumes tokens. This means the total token usage is not just the sum of the tokens in the retrieved chunks but also includes the tokens used for the query and the LLM's output. I hope this explained a bit why RAG tends to use a significant number of tokens. The need to process multiple chunks of context to generate a response naturally leads to higher token consumption. This can really be seen when comparing the token usage to more straightforward LLM queries that do not require external information retrieval like in your last attached screenshot. These additional tokens are used to ensure the LLM has enough context to generate the most accurate and relevant answer possible. The total number of tokens used by RAG can vary a lot depending on the query and the number of relevant chunks retrieved. If it retrieved less chunks with a smaller chunk size the LLM wouldn't have as much context which can lead to the LLM not being able to always generate a relevant answer. Maybe @rich-battery-69172 or @famous-jewelry-85388 can share some more information about what is going on behind the scenes?
g
Thank you so much for your detailed reply. And for detailing the work.
Of course depending on the context the number of tokens varies, this is at least fair. But why is the base value so large? 4262 tokens for viewing a couple of files?
b
large relative to what? do you have examples of performing a similar operation with this knowledge base elsewhere that's using less tokens?
f
Have you tried using the same docs with Langchain?
g
As I say, I will try to test through gpt directly
No
I d on`t really know, how to work with it
f
Fair enough. I think it would be interresting to see the difference. I deleted my Langchain projects unfortunately but when I am done with my assignment for school I can try and build a RAG bot and compare it.
g
Wow, I'd really appreciate it
b
I'm also curious about the outcome of these tests, you two keep me posted 😛
g
🙌🏾
Hi, as we discussed yesterday, I did a little test with my Ukrainian database. The model used was gpt 3.5. Loaded the same database and asked exactly the same question. The difference in tokens is 1200. Why is it so?
Also an example of a question that is not in the database. Namely the question: How are you? In the case of the gpt question, I paid only 1400 tokens directly. And in the case of botpress, I paid only 4200 tokens for viewing the database
So here it is. I did a little meet and greet with the bot. Asked about the agency's services, and then asked how I was doing. In the case of the test directly to gpt I only paid 6004 tokens. And in the case of botpress 13400 tokens. Which roughly speaking is twice as much. And this is taking into account that I have personality agent and translator agent disabled.
In addition, bot generated absolute similar answers in gpt and Botpress
That’s problem isn’t in the prompt
b
very interesting, thanks for sharing this!
g
what is our next step?
@bumpy-butcher-41910
I'd like to remind you that the question is still valid.
f
What would the best next step be?
For you
g
I was told by robert that they are looking for a solution to this problem. And just the answer would be the best solution for me. I wrote a message today just to keep the chat from being deleted, still waiting for a reply from robert.
b
we don't delete chats
f
Discord removes them from the visible list
b
and we're still trying to identify if this is a problem, or if this is just the nature of running requests through our service
yeah^ but they're not deleted
and they're only removed from the sidebar, not the actual help thread, which is what I check
f
True
Just to add. @gentle-baker-47357 you could look into doing the RAG part on a local machine and then use it from Botpress via an API if the token costs are a big problem. If its because you are planning to use this in production you could also look into doing the RAG part on the cloud instead so its faster, but here you would have to compare the costs of running it on the cloud vs the savings in AI Spend
g
Thanks that you are working on it
And Thank you for advice! I really appreciate your time!
f
You are very welcome
6 Views