Company knowledge base with multiple websites (inc...
# 🤝help
m
Hi there. We have different sources of information and no longer control about WHERE is WHAT, are there duplicate informations and maybe the question about outdated information. But we have a lot of security concerns. The minimal solution would be: 600+ PDFs and 2.000 HTML files (copied from the web) Second phase: full website search for 1 or more websites including subpages and 600+ PDFs Third phase: second phase but the 600+ PDFs are located at the website and are included in the data (no more HTML + PDF, as the PDFs are available on the website) Fourth phase: third phase + some websites that require login Fifth phase: Fourth phase + confluence pages Sixth phase: make it possible to compare different sources (what are the differences?) and find out which information might be outdated (Internet connection / bing needed to check against current information, news, laws etc.) How far can I get with botpress and some programming work? Can botpress automatically include the subpages of a domain? How can I solve the login problem? Is there a way to use the data on a local server and maybe even the LLM model? My guess so far: a lot of those things are possible on its own. But all together?
a
He @miniature-battery-96440, I'll try to answer a few of these questions: * Can Botpress handle 600+ PDFs and 2,000 HTML files? Probably, but it wouldn't work well. * Can Botpress authenticate users? Not on its own. You would need to embed the bot in a page that is past the auth gate * Can Botpress automatically include subpages? If it's sub.domain.com it has to be manually added. If it is domain.com/page the search engine decides how deep to go. * Can Botpress compare sources against each other and solve conflicting information? No, this is not the intended use of Botpress * Is there a way to use data on a local server? If that local server exposes an API endpoint, sure. * Can the LLM tun locally? No
m
Thanks for the fast response. Bad luck I guess? We habe websites mostly with additional PDF information. So seaching through all websites and pages but also the PDFs would be great. Ok, we have the PDFs as well on their own. Then we have at least 2 or 3 tools with login with additional product information. And we have a lot of confluence pages with some more information. Searching all oft this would be great. How we can bring it to Chat GPT might be the question. As the websites change on a daily base it would be great to directly get the information from the website. So not a HMTL export and then import into the bot. This would be the perfect dream version for me. We are willing to put in some effort and money. Thought botpress would be a great start, as I can specify different sources and types. 😦
our main website alone hast 1.800 pages. and we might need 3 or 4 to be included. The main website hast 600 PDFs on the pages. Same PDFs might be inside some other sources. Confluence is not so important or could be its own chat bot. But all the other sources would be great.
And as we are from germany, with have huge security concerns! 🙂 That is a perfect start for an AI project, isn't it?
It looks like we already out of look with using www.ergo.de as a source?
out of luck (not look)
a
You're probably better off with a more data-focused tool than Botpress. Certain things that sound important to you, like automatic website scraping and duplication, are not a part of Botpress Cloud and might be a bit out of scope for us.
34 Views