KB Doc Upload Rate limited.
# 🤝help
i
**What I want to do**: Upload hundreds of .pdfs into the KB **What I expect to happen**: Batch uploads should work. **What's happening instead**: Getting rate limited for a minute after 14 uploads, uploads in queue being rejected (so still making the post calls), picks back up after timeout ends. Ok before I make a screen recording, is there an advantage to having files split up? I have each pdf classified and properly named, but if the KB is ignoring file names and simply aggregating each file into one single corpus then I'll just write myself a script to combine each one of my pdfs into one single corpus.
r
@icy-address-63961 hey i'm wondering what is the use-case for hundreds of files – do you think having the ability for KBs to answer from Tables would work for you, if you could store vectorized text inside them?
i
I'm dealing with hundreds of pdfs because It's the product of my specific data aggregation method. Are you saying that each document is stored in a table with vectorized text in it?
r
Well, I'm trying to understand if document upload is the right data source for your use-case or if we should perhaps develop a new data source that is more suited to your use case. Since we have Tables in Botpress and we're soon releasing the ability to make Tables a Data Source for KBs, I thought I would ask if that would be easier for you to manage than uploading hundreds of PDFs.
Maybe it would be easier to have a "Google Drive" data source / "AWS S3" data source? where Botpress would do the work of periodically syncing the documents
-- But yeah to answer your initial question – you are indeed rate limited in the studio to 15 requests per minute on most operations
i
Yes this would definitely be a better solution for my needs. Does this currently exist?
r
It doesn't but I think that would make sense product-wise
i
But again, does KB look at file names at all? Does it aggregate my data into one corpus before vectorization?
r
What do you mean aggregate your data into one corpus? do you mean reducing the document recursively to make the embedding better? Or something else?
And yes it does look at the file names
i
I suppose I'm unclear about how its working under the hood. You'll have to forgive me as i'm fairly new to embeddings/llms/vector DBs Let me just share what I'm doing exactly: I'm transcribing individual pieces of content from creators, summarizing that content to generate a filename, and then exporting the raw transcription + some other datapts from the contents metadata as a pdf -- hence why I have file amounts that are in the hundreds
r
OK got it, makes sense. And how will you query the KBs? Will you filter based on metadata the files in which you search, or always search all documents?
i
always search all documents. This will be a sort of general Q&A for their audience
r
ok makes sense. are the file names important for the KBs in that case?
i
I'm not sure -- I assumed that it would be best to name them incase the names were somehow relevant in making search faster/more accurate
r
ok so if knowing the exact file (like Joe_Rogan_2021_e2345.pdf) isn't critical for your bot, you will probably have no big difference in results if you have one giant file vs many small files
i
Gotcha thank you!!
r
just make sure to user headings in your PDF, split your paragraphs, etc. Good formatting is still important and will impact responses!
In fact separating each transcript with a page break is best (so embeddings won't be mixed up)
i
And to be clear on this, you mean that each transcription will have it's own page correct? Not just a blank page between each?
r
yeah each their own page