I am using a knowledge agent and I can see in the log that it uses a large number of source chunks: 34. I want it to use a maximum of 3.
I mean how many data source chunks will be passed to the LLM as context to generate a response.
On the other hand I want to control the size of the response message, the response is too long.
I have one last question when I look at the log I see:
"12:37:14
debug
dm
Billed Tokens: 12,577 | Total Tokens: 12,577 | Cost: $0.0130 | Cache Savings: 0%"
I thought you were billed for conversations, not tokens!!!