Knowledge base basic question
# 🤝help
c
I've specified 3 urls in the knowledge base. When it's looked up, just one of them is being considered, while 2 others are ignored, as if they don't exist. All those urls are simple, but not too much info. Why would it be? Any ideas how to fix it?
a
Hey Vlad, how are you adding the URLs into your bot?
c
Just in the Knowledgebase, in the place where I could specify the sources, added all 3 URLs in there.
One after another
a
interesting. How can you tell that only one URL is being considered?
c
One URL is extensive one, contains a lot of info, while 2 others are simple - one-pagers. They have a very specific information. When I ask about it - it says no answer found. It's impossible to make a mistake - in those one-pagers even keyword search would find that info.
a
Are your URLs base urls without any pages or anything after the .com/ ? The KB only considers the main root of the website and not specific pages
c
In here the first 2 are one-pagers, the 3rd one is extensive one. So the extensive one is being considered here, while the first 2 ignored.
a
Since we rely on a web search for these knowledge results, it might be possible that the main website is so much more visible than the sub domains. Especially if it contains the same information as them.
Are you using a test question that can only be answered on one of the subdomains?
c
Yes, absolutely. Each of the one-pagers has its own unique information (and corresponding keywords) that the main one doesn't have.
a
what's your test question? I'll try to replicate it on my side
c
For example, like this: "Do you have a referral program?" (this is for refer.ladore.me) and "What are the main beneifits of a facial treatment if I am 30 y.o.?" for facial.ladore.me
(the 2nd one may be arguable, as the main site has some benefits, but the one-pager has beneifts by age groups, so it should pick it up, but it's more subtle, while the referral program is totally unique)
a
So it looks like you might need to do some SEO on those subdomains. I put them in a knowledge base and those subdomains are included in the search scope. However, Bing and Google don't search those domains, so we get no results.
Alternativly, you can "print" these pages to a PDF and inlclude the PDFs in the knowledge base as documents
c
Ah! Of course - they are meant for internal usage, they are not suppoesd to be SEO-ed 🙂 I didn't realize that that KB with sites works with real-time searches. I'll just make them into PDF - thank you!
a
happy to help!
c
If you don't mind, just a quick follow-up - I've included 2 PDFs - but it still can't find the info. How would I debug this?
The last 2 PDFs are exactly those - corresponding to those 2 one-pagers
a
I use the bottom panel, and click on the events on the left side
Heads up, GPT is degraded rn so things may be wonky https://status.openai.com/
c
Ah yes, this I know - checking those - but I mean specifically to debug why the info in the PDF is ignored - when I ask about Referral Program, it has no idea, though a simple keyword "Referral Program" is only on that one-pager PDF - or ChatGPT status can affect this too?
a
ChatGPT could be affecting it. It also depends how the PDF was generated- is is actual text or an image of text? Image of text is worse, so sometimes I'll try to copy+paste it into a Google Doc, export that google doc and see if it improves things
c
Got it, thank you very much.
It was an image indeed - converted to text PDF. But it still doesn't reflect on the results. I wish there was a way to debug it outside of botpress. Any ideas on how to proceed to figure it out?
a
Give it a try again? Now that OpenAI is back up to speed, I'm not having any issues getting info from a PDF

https://cdn.discordapp.com/attachments/1129070140093378560/1129366860010373171/image.png

c
Indeed! This dependency on OpenAI is scary 🙂 Thank you!!
2 Views