Cost not proportional to token use in some cases? Botpress #🤝help

Cost not proportional to token use in some cases?

wonderful-wolf-37241

03/07/2024, 6:56 PM

I’m currently testing a bot that incorporates Knowledge Base. Aside from the single KB query, the bot has only one AI task (GPT-3.5) which typically uses fewer than 1000 tokens per run. On runs where an answer is found in the first KB source (which is a web source), token consumption is around ten thousand tokens, with a cost of about half a cent (0.005) However, in some cases, including when no answer is found in the KB, token use is roughly double at about twenty thousand, but cost is increased by a factor of ten or more, typically around ten cents (0.10) I don’t see anything explicit in the logs or docs that would seem to explain this phenomenon. I’d like to understand what is causing cost to be out of proportion with token use. All guidance much appreciated!

bumpy-butcher-41910

03/07/2024, 7:29 PM

in the event debugger, you're able to see exactly which actions consume how many tokens

bumpy-butcher-41910

03/07/2024, 7:29 PM

as well as their associated cost

bumpy-butcher-41910

03/07/2024, 7:30 PM

ideally this should break it down

bumpy-butcher-41910

03/07/2024, 7:30 PM

if it doesn't, can you provide some examples of these logs where the cost is increased disproportionately to token usage?

wonderful-wolf-37241

03/07/2024, 8:26 PM

@bumpy-butcher-41910 thanks for the reply! So in the example of the last case where ~20k tokens are reported in the log but the cost of $0.10 appears disproportionate, in the Event Debugger pane, the only entry that shows token count is the AI task, which shows

Input Tokens: 820

and the KB query does not show token count. -- in the logs, the knowledgeAgent action has two entries

Generating answer based on a 41 results

and

no helpful answer generated by KB

, but does not register how many tokens were used in its operation. The only other entry for the run that I see which gives a token count is the the final entry,

Billed tokens: 20,806 | Total tokens: 20,806 | Cost: $0.1054 | Cache savings: 0%

bumpy-butcher-41910

03/07/2024, 9:00 PM

gotcha!

bumpy-butcher-41910

03/07/2024, 9:00 PM

and what part of these numbers is disproportionate to what you were expecting?

wonderful-wolf-37241

03/07/2024, 9:17 PM

comparing that previous example with another run where the KB did find an answer and roughly half the previous example's token count was used (total 10,872), but the cost was roughly one-eighteenth of the previous example at $0.0055 -- here's the log summary of that run:

Billed Tokens: 10,872 | Total Tokens: 10,872 | Cost: $0.0055 | Cache savings: 0%

. --- so comparing the two cases, it would appear that the previous run in which the KB did not find an answer, was about nine times more expensive per token used, than this run -- n.b. I've tried many runs of each type, and the results are always roughly what I've described here, so the phenomenon isn't a fluke

bumpy-butcher-41910

03/07/2024, 9:24 PM

the only way that I can explain this behaviour is that it looks like one is using GPT-4 and one is using GPT-3.5

bumpy-butcher-41910

03/07/2024, 9:25 PM

without having done the test myself I can't confirm this

bumpy-butcher-41910

03/07/2024, 9:25 PM

because input and output tokens are also billed at different rates, that will also cause some variation in cost

bumpy-butcher-41910

03/07/2024, 9:25 PM

but not at the scale you've described here

busy-computer-17068

03/07/2024, 9:26 PM

same thing while an error, total cost much higher

wonderful-wolf-37241

03/07/2024, 9:26 PM

ah! I have "hybrid" selected as the "Model Strategy" in the Knowledge Agent config, do you think that might be something to do with it? Like it's switching to "Best" when an answer not found, or something along those lines?

bumpy-butcher-41910

03/07/2024, 9:26 PM

ah there you go

bumpy-butcher-41910

03/07/2024, 9:26 PM

that's exactly how the hybrid selection works

wonderful-wolf-37241

03/07/2024, 9:27 PM

aha! thanks for confirming that -- and the cost difference sounds in the ballpark to you, i.e. 4 is 9+ times as costly per token than 3.5?

bumpy-butcher-41910

03/07/2024, 9:28 PM

you don't need to take my word for it:

bumpy-butcher-41910

03/07/2024, 9:28 PM

https://openai.com/pricing

bumpy-butcher-41910

03/07/2024, 9:29 PM

you're charged at cost, we don't mark up the prices you see here

bumpy-butcher-41910

03/07/2024, 9:30 PM

4-turbo is roughly 20x more expensive than 3.5 turbo

wonderful-wolf-37241

03/07/2024, 9:30 PM

yep, that looks like it'd do it all right, hahaha! Thanks a bunch for helping me work through that -- I'm gonna try changing the config to "Fastest / 3.5" and seeing how the results look

bumpy-butcher-41910

03/07/2024, 9:30 PM

\o/

wonderful-wolf-37241

03/07/2024, 10:47 PM

Closing the loop on this: Changing the KnowledgeAgent setting / config to "Fastest / 3.5" did indeed make the cost-per-token consistent across various queries, as expected. It also improved cache savings on repeat similar queries. (There was also an expected, notable, performance degradation when switching to 3.5-only). Thanks again for the help!

bumpy-butcher-41910

03/08/2024, 4:34 PM

thanks for following up! glad to hear it 🙂

4 Views

Previous Next