KB adding non-desired, non-selected pages from web...
# 🤝help
w
I'm adding a Knowledge Base, and for its data source, I'd like to add a Notion page that we have exposed for this purpose. When I add the site and select "discover pages," the discovery finds over 5000 pages, most of which are unrelated Notion.so pages; it's easy enough to deselect these using the provided checkboxes. However, after deselecting the undesired pages and clicking "Add pages," over 80 undesired pages (same type, Notion general pages) are added. I suspect that this is happening because the bot is following the non-removable "Built by" link that Notion places on all exposed pages. The problem is that I don't want the KB to return answers based on data in these undesired pages. Is there a way I can get the KB to add only the pages I specify / leave 'checked', and not follow links etc? All guidance much appreciated!
r
Did you specify which KB the bot should answer from
w
Thanks for the reply! I haven't used the KB in the bot yet. This question pertains only to the ingestion of pages from a web data source by the KB.
w
^^ aha! I didn't notice the option to choose "specific web pages" rather than "a website" -- thanks mucho for pointing that out! Will try it straight away
f
You are very welcome, and let me know if it works
w
Closing the loop on this: Choosing "Specific web pages" rather than "A website" did indeed help prevent undesired pages from being ingested! Big thanks to @fresh-fireman-491 for the help. // // note: using an exposed Notion page for a data source did not enable very good results in terms of getting the expected responses to queries / prompts, and there was still a lot of Notion metadata present, but I believe these issues are related to the way exposed Notion "site" pages are structured, rather than with the ingestion process on Botpress' end.
f
Thank you for that!