:llama: :flag_es: Try Llama 3 in Botpress and solv...
# 📖tutorials
q
JUMP HERE FOR THE BEST INSTRUCTIONS https://discord.com/channels/1108396290624213082/1230732214891712674/1230928224079314977 Let’s start testing the new Llama3, the best Open Source AI model out there. I’ll be posting some results here. Here I'm testing it first time using the 'completions' end point and unofficial llama-api (APIs for open source LLMs), since I couldn't yet find 'chat' endpoint there.
Copy code
js
const main = async () => {
    try {
      const response = await axios.post(
        'https://api.llama-api.com/chat/completions',
        {
          messages: [
            {
              role: 'system',
              content: "Assistant is a large language model trained by META."
            },
            {
              role: 'user',
              content: workflow.question
            }
          ],
          model: 'llama-3-8b',
          max_tokens: 800,
          stream: false
        },
        {
          headers: {
            'Content-Type': 'application/json',
            Authorization: `Bearer ${env.LlamaAPI}`
          }
        }
      );
  
      workflow.llamaAnswer = response.data.choices[0].message.content;
    } catch (error) {
      console.error('Error fetching completion:', error);
    }
  };
  
  await main();
https://cdn.discordapp.com/attachments/1230732214891712674/1230732215067869246/Screenshot_from_2024-04-19_07-04-35.png?ex=663463b1&is=6621eeb1&hm=14fcc04bb4a1fe6a5819a139998194227085288f51ab269c6502923be0379909&
j
Oh, mabye because im in Vietnam
But its a very nice website
f
Amazing as always
Llama 3 had a perfect release timing
q
First tests: sometimes it gives really good, long answers to basic questions, but other times it cuts them off weirdly in the middle, just using the usual OPEN AI style API syntax that every other company also uses. Let's test more and see what changes are needed.
f
It needed some weird stuff
I thought they did that, when we would use their API, but I could be wrong
q
I recommend every bot builder and open source enthusiast to join and test until it works as we need it to work in Botpress, some people (even in this server) are already saying it's better than GPT-4, Claude-3, and Gemini Pro (obviously, that depends on the task). Let’s test together, or you can just wait for @fresh-fireman-491's more advanced tutorials and YouTube videos, after which everything always will work perfectly.
I'll start testing it with some coding challenges, then I'll learn more and update the code and instructions if needed.
Let's add console.log back to the code to see the full response message
Copy code
js
      workflow.llamaAnswer = response.data.choices[0].message.content;
      console.log("TEST:" + JSON.stringify(response.data.choices[0].message))
After the answer, it returns a new line, an opening square bracket, and a 'function call: null' message. So, at least we know we need to deal with those. https://cdn.discordapp.com/attachments/1230732214891712674/1230747755136290866/image.png?ex=6634722b&is=6621fd2b&hm=101804474c581c9abe7e1c2f69db296c6a9d932026fee4b5d881ef7cfcf17c24&
f
Heading out for school now, but I can try and see if I can find a fix for it later today 🙂
f
Although I am sure that you can have one ready much before that 🦸‍♂️
q
f
I was able to find the post from yesterday https://github.com/meta-llama/llama-recipes
I didn't really think that we would need it when we used their API, but we might
q
Thanks! I think that might be it! 🦸‍♂️ 💎 🫡 Some bot builders asked for a quick test code to play with. This worked well when I tried the same first two questions all bot builders around the world also use: 'What is Botpress?' and 'Who is Elon Musk?' So I decided to write the code here. But after more testing, I found out there are still many issues. I'm going back to debugging mode 🛠️
f
Best of luck!
The 400B+ model is probably going to be better than all of the current models out there
The worst part is the small 8K context window
The 400B model is still training, but their checkpoint version is on par with GPT-4
Which is crazy because GPT-4 is 1.8 trillion tokens
q
I’ve seen some advanced tutorials using 'RoPe scaling' to increase the LLM context window or even use an unlimited context window. I haven’t studied that much myself, but I’m guessing it’s a technique only for fine-tuning self-hosted Open Source LLMs using Huggingface + Python + Google Colab, etc., and there the unlimited context window actually works well and solves the previous fine-tuning issues. That's not a technique for API calls and using the LLM, but since they've already solved this fine-tuning part, we can also believe that many AI builders are working hard on techniques for API calls and end users as well, so we don't need to ask questions and request help from '8K bimbo'.
Now it removes all the weird characters I could find in my testing and looks okay in Botpress, so it's a good enough starting point for us to develop it for more advanced use cases and chatbots.
Copy code
js
const main = async () => {
    try {
      const response = await axios.post(
        'https://api.llama-api.com/chat/completions',
        {
          messages: [
            {
              role: 'system',
              content: 'Assistant is a large language model trained by META.'
            },
            {
              role: 'user',
              content: workflow.question
            }
          ],
          model: 'llama-3-8b',
          max_tokens: 800,
          stream: false
        },
        {
          headers: {
            'Content-Type': 'application/json',
            Authorization: `Bearer ${env.LlamaAPI}`
          }
        }
      )
  
      let formattedResponse = response.data.choices[0].message.content
        .replace(/\n/g, ' ')
        .replace(/\[",\"function_call\":null}/g, '')
        .replace(/[<>\[\]\/]/g, '')
  
      console.log('TEST:' + JSON.stringify(response.data.choices[0].message))
      workflow.llamaAnswer = formattedResponse
    } catch (error) {
      console.error('Error fetching completion:', error)
    }
  }
  
  await main()
https://cdn.discordapp.com/attachments/1230732214891712674/1230777626638155786/Screenshot_from_2024-04-19_10-05-29.png?ex=66348dfc&is=662218fc&hm=559749b934a5a4e93a3398c750dd4896e9a4b74a6d779c3be04695d69e45c800& https://cdn.discordapp.com/attachments/1230732214891712674/1230777626965049364/Screenshot_from_2024-04-19_10-05-15.png?ex=66236a7d&is=662218fd&hm=91023c8c17eff5c56b5f1a4ad1355194f8d565ae2707b937c059581e702fbba2&
@fresh-fireman-491 When you are trying to find a better or correct way to do this, responses can end like this (after the actual answer) \n<</","function_call":null} .</","function_call":null} [","function_call":null}
but now it works, so I can start testing if we can already replace other LLMs with this one (definitely Llama2, but how about Mixtral-8x7b, Claude-3, etc.).
After one day of testing and so many issues, I have to say there might be much better ways to try and use Llama3, and I'm going to try them next: -https://huggingface.co/chat -Meta Llama 3 with Replicate API -Meta Llama 3 with Azure AI Studio -Meta Llama 3 on Google Cloud Vertex AI
HuggingChat might be the best way for bot builders to test the biggest Llama3 chat model for free, to see if it's worth it and can replace even GPT-4 and Claude-3 in some of our use cases. https://cdn.discordapp.com/attachments/1230732214891712674/1230909126956285983/image.png?ex=66350875&is=66229375&hm=e157aaad28feef022f8aa7bca4ef00458de2114737db9b998a3bb19bd41bc143&
If you have access to meta.ai, let us know how it works. It aims to be a ChatGPT competitor, using open-source Llama 3 with a free image generator (competing with DALL-E).
Secondly, Llama3 is said to excel in coding and math, both areas I find extremely useful.
It can be a really big competitor for OpenAI (some even say it's game over already, and Meta won 🤣 ) "A better assistant: Thanks to our latest advances with Meta Llama 3, we believe Meta AI is now the most intelligent AI assistant you can use for free." "You can use Meta AI in feed, chats, search and more across our apps to get things done and access real-time information, without having to leave the app you’re using." "Meta AI’s image generation is now faster, producing images as you type, so you can create album artwork for your band, decor inspiration for your apartment, animated custom GIFs and more." "Built with Meta Llama 3, Meta AI is one of the world’s leading AI assistants, already on your phone, in your pocket for free. And it’s starting to go global with more features. You can use Meta AI on Facebook, Instagram, WhatsApp and Messenger to get things done, learn, create and connect with the things that matter to you." https://about.fb.com/news/2024/04/meta-ai-assistant-built-with-llama-3/
h
Hello guys, happy to see this tutorial as usual. I got access to Meta last night and was chatting with it direct on whatsapp until like 3AM. Haven't built anything with it but it seems its a very solid model and this might be another ChatGPT moment and this model is that big. I think people just haven't caught up yet. The crazy part is rolling it built in in WhatsApp to 2 billion users👀🔥 Accessible in messenger too. I am thinking this LLM is going to move the whole world into embracing AI more than ChatGPT did because now EVERYONE (not just tech people) is going to start engaging with an LLM easily in their messaging platform than an external site like OpenAI
And also Groq just announced you can access Llama 3 via their API
q
⭐ ⭐ ⭐ ⭐ ⭐
@hundreds-battery-97158 Thank you! You solved this API issue once and for all. It wasn't available in groq or huggingface when I checked it, so I'll change this to be the recommended way and update the chatbot code to use groq.
h
That was quick
q
Copy code
js
const GROQ_API_KEY = env.GROQ_API_KEY;

const data = {
  messages: [
    {
      role: 'system',
      content: `Assistant is a large language model trained by META.`
    },
    {
      role: 'user',
      content: workflow.question
    }
  ],
  model: 'llama3-8b-8192'
};

const main = async () => {
  try {
    const response = await axios.post('https://api.groq.com/openai/v1/chat/completions', data, {
      headers: {
        Authorization: `Bearer ${GROQ_API_KEY}`,
        'Content-Type': 'application/json'
      }
    });

    workflow.llamaAnswer = response.data.choices[0].message.content;
    //console.log(response.data.choices[0].message.content);
    return workflow.llamaAnswer;
  } catch (error) {
    console.error(error);
    throw error;
  }
};

await main()
  .then(message => console.log('Groq message:', message))
  .catch(error => console.error('Error:', error));
h
Just about to export. Lets see what new things we are learning from devmik tonight
Speed⚡⚡⚡
q
Yessssss
Llama3 in Botpress with groq API is quite fast, and in my opinion, without any extra use-case-specific prompting (using only the system prompt 'Assistant is a large language model trained by META'), it gives almost perfect answers. It even automatically bolds the relevant parts of the answer, something other models struggle with even when instructed to do so. https://cdn.discordapp.com/attachments/1230732214891712674/1230962445154123826/Screencast_from_2024-04-19_21-48-18.webm?ex=66353a1d&is=6622c51d&hm=18e2ab496967ef5ba955099343cd09dd7e8992a7696e88243d877b9927ea9f22&
I'm sure everyone already knows this, but just to encourage more bot builders to test it and share their experiences, if there's still someone who doesn't know, in addition to being fast, accurate, and perfect, Llama3 is also free to use in Botpress with the Groq API (at the moment) 🦙 🚀
I'm definitely going to replace many parts of my previous multi-agent workflows with Llama 3; there's no question about it. After only a few hours of testing, and many hours of debugging before that, I'm 💯 confident in the benefits and improvements that Llama 3 will bring to all my current systems.
f

https://youtu.be/0AaNT7XO41I?si=Y59ryx3EBTIePrkc

Test of the model
q
Watching these will help those of us who are without meta.ai access understand more about Llama 3's capabilities. When using API, it’s also nice to see the performance boost when changing the general system prompt to use-case specific ones, like in my four little chatbots: coding helper, research assistant, think step by step, and create smart contracts.
a
Hi, How can I get a groq API? It seems it's only for premium subscribers.
q
When you create an account and go to 'billing', you can see this. After creating an API key, it's free to use. "Due to the overwhelming interest in Groq, we're excited to extend our free beta of GroqCloud to increase capacity and meet demand. This extension means a slight delay in launching our paid access with higher rate limits. We encourage you to keep sharing your Groq-powered apps and demos with us for potential spotlighting. Throughout this extended beta, we'll keep enhancing our platform based on your feedback, adding new models and features to remain the fastest and most developer-friendly inference platform." https://cdn.discordapp.com/attachments/1230732214891712674/1231145314656976897/image.png?ex=6635e46c&is=66236f6c&hm=376e9be30204447d03b90417ed4bc8da7c4b38d97a5ec96cde04e726269bc9a5&
@agreeable-accountant-7248 Some early testers have reported the speed difference when using Llama 3 and GPT-4, both giving the same results. Llama 3's response times in real-time conversation were around 1 second (all between 500-1500 milliseconds), compared to using GPT-4 where response times were between 4-20 seconds. Same task, both giving correct or equally good answers. We need to test this too in Botpress; what a great way to speed up chatbot flow for our users. If a conversation includes multiple tasks, assistants, or agents, having the total response time (all tasks combined) be seconds rather than minutes sounds much better.
I have been practicing with Cohere since you mentioned it. It would be nice to see your results comparing these, or even using them together in the same chatbot workflow, doing different tasks. https://cdn.discordapp.com/attachments/1230732214891712674/1231153216482181130/image.png?ex=6635ebc8&is=662376c8&hm=dfc059790a8d9eac0516d74bfb3ab1bec15085c46996307bd3e2ee57742642a9&
a
wow, sounds amazing. I have been trying almost all known LLMs and derivatives approach as, for my use case, I don't find something totally trustful. I have tested OpenAI, Claud3, Cohere, Perplexity, Gemini, etc and seems I can't find what I need. I have several problems when I try to find events and plans in the future. Best result until now it's Cohere but event trying different prompt aproachs sometimes it gives me back info from the past or even worst, info that it's not accurated or even false. So sometimes I give to the user info about concerts, museums or activities that are not true, with wrong dates and it's annoying So I will test Llama3 and let you know if it could replace Cohere. Thanks all of you for your help, it's amazing how fast goes AI advances!!
this sould be the best aproach but costs rise and it would be more difficult for a production project. The balance between AI power (more prompt tokens, interconnected bots for particular tasks, etc) and cost is a difficult game 😊
first test done and... it works really really fast. I need to check if events and dates are accurated but seems promising. One more question (sorry for spaming the chat 🙂 ). Do you know if is there any way of sending previous questions and answers so llama3 can have context for next questions?
f
You should be able to include them as assistent Messages
It has a knowledge cut off
March 2023 for 8B and december 2023 for 70B
q
@agreeable-accountant-7248 "Best result until now it's Cohere but event trying different prompt aproachs sometimes it gives me back info from the past or even worst, info that it's not accurated or even false." I can share my solutions or ideas to 'better prompting' over the weekend, which have worked the best for me so far (in my four chatbots for 1) code challenges, 2) research, 3) solving problems step by step, and 4) creating blockchain smart contracts), so we can compare ideas and results. Some ideas for prompting to make AI follow instructions better are mentioned here https://discord.com/channels/1108396290624213082/1230289745171710164
Here's a short video showcasing a chatbot where I try to solve coding challenges and send the previous questions and answers (stored in variables) to the next LLM to further improve the results. These questions and answers are quite long, which is why I needed to refine my prompting strategies. For example, adding all important variables at the end, as mentioned in that earlier link, significantly improved the quality of AI responses. Placing variables that contain a lot of text throughout the prompt can confuse the AI and cause it not to follow instructions as effectively. This project in that following link is not fast; its main goal is to give the best possible quality answers, that's why I mentioned this (related to your comment about costs): "I've tested this a lot on research topics that are complex (for me, anyway). It costs around $0.20-$0.40 max (Botpress AI token spend + API calls combined). So, it's not a solution for every use case because of the costs. But for what I need right now, if a company can build much more useful guides from important stuff than they can with GPT-4 alone, and share those with 100 workers, they won’t mind spending $0.20 or even $2.00." https://discord.com/channels/1108396290624213082/1227955163373768714/1229064978443538554 And please @agreeable-accountant-7248, spam the chat! You have a quite unique style in your messages, where you thank everyone after a long message in which you have given us so much valuable information, and we should be thanking you. I like that style a lot!
a
thanks for your message!! We are living in a time in which we should all help each other but unfortunately there is a lot of hate and anger on social networks and in life in general. My parents taught me that you should always have respect for others and be grateful, you never know the personal story behind each person. Personal thoughts apart, I have been testing Llama3. Fastest answer of all LLms I have tested. I have included chat history to have context and being able to chat directly with the user and works really good. I have got some fake results from events (Iron Maiden playing today in Marbella, Spain :-p) so I think I need to improve my prompt to avoid this kind of results. Having a free and fast LLm that works really good I need to focus in erradicate halucinations and fake results. Maybe, now that I can reduce cost from LLM I could check, with another AI, the results so I never give to the user fake information. I will let you know if I can move forward through fake event answers. Thank you and have a nice weekend!!
j
So LLAMA AI is better that Chatgpt ?
q
I guess it depends on the use case, so I recommend using the free Groq API (code, bot file, and all the instructions mentioned here) to get access to Llama 3 (which Meta.ai also uses) and testing it in Botpress and your other AI projects. In my opinion, and for the tasks I need, I'm really happy with Llama 3, its quality results (GPT-4 level), and speed (sometimes 20x faster than GPT-4, which our chatbot clients will obviously appreciate). I'm going to update many of my projects to use it.
j
👍
q
@jolly-policeman-82775 I'm sure you'll build something really cool with Llama 3 🦾 If you haven't tried it yet, check the short video above to see the speed. That's one thing many bot builders have been surprised by (in addition to those quality results from an open-source model).
j
Alright
Ill check it out when i have some time on my hands.
j
But ive quit building bots 😁 now i do customization for bots
q
that amateur video I created, not the YouTube one
j
Cool, quick and fast.
But why not use a regular Quary knowledge base
with the knowledge base intergrated onto the internet ?
q
Those are all good ideas. This was for testing the API and demonstrating the speed and quality. I have asked many similar short questions, difficult multi-step tasks, and coding challenges, and I like the first results a lot 🚀
we need to properly test the API (Groq) and LLM (Llama3) to see if we can really replace some important parts in our chatbots and other projects with it.
f
Llama 3 is horrible in any language that is not in English. They did also say it, but I tested it yesterday and it was bad. It is also a pro though. This means that a larger part of their training data was in English, which means that it is even better at that language.
q
That's really good to know! 💡
@hundreds-battery-97158 @agreeable-accountant-7248 Some prompting ideas and strategies https://discord.com/channels/1108396290624213082/1231503419278360576
c
100% GPT-4 can be very slow, almost too slow for production tasks imho. Are there data security risks for Croq/Llama ?
Interesting, in that case we could potentially still use Llama to retrieve data and use another LLM GPt-3.5 to translate the results. Really curious now 💫
My thoughts on this: major drawback of relying on the internet for real-time queries is the variability, accuracy and trustworthiness of information retrieved. Unlike curated knowledge bases, the internet contains allot of junk information which can make your KB unreliable.
j
Oh, you changed your pfp
cool
a
Morning. Yesterday I was testing all day both Llama3 (8b and 70b versions) and Cohere. Still fighting to get real results. Start to be frustrating because even with easy tasks like: "tell me ice cream shops in the center of malaga" the results I get are not 100% real. I'm testing two prompts (best until now): > 1) Date: Sunday, April 21, 2024. > > Consultation: Plans in Malaga and province for today > > Instructions for the model: Perform searches and provide verified and updated information on the user's question that may be related to events, plans, leisure activities, museums, concerts, exhibitions and popular festivals in the towns of Malaga. Use reliable sources to verify the existence and availability of each option mentioned. Only include in your answer those places or events that are confirmed by these sources. > > Answer requirements: > - Include names of confirmed establishments or events. > - Provide brief descriptions of each option. > - Detail service hours or duration of the event. > - Indicate delivery or access methods. > - Offer average user ratings according to the sources consulted. > > If you cannot find confirmed information about a query, it is preferable to report that no available options have been found rather than providing unverified information. > > Mandatory sources to verify information: > - Google > - TripAdvisor > -The Fork > - Malaga leisure and culture agenda > - Fever > - Eventbrite - Free events in Malaga > - Malaga of Culture - Agenda > - La Opinion de Málaga - Events > - City of Malaga - Agenda > - Today Malaga - Activities > > Conclusion: Review these recommendations and tell me if you need help with anything else or if you want additional information about a specific site or event.
> 2) ### Prompt: Local events, plans, activities and restaurants in the province of Málaga > **Date**: Sunday, 21 April 2024. > #### Context: > You are an AI trained to provide real-time, verified information about Local events, plans, activities and restaurants in the province of Málaga. Your role is to assist users in finding reliable and current suggestions for genuine and quality answers. > #### User Query: > Planss in Málaga and their towns for today > #### Task: > 1. **Search Phase**: Conduct a thorough search for the user query. Extend this search to include current events, activities, cultural sites, concerts, and local festivities. > 2. **Verification Phase**: Cross-verify the existence and operational status of these places and events using trusted sources. > 3. **Response Construction**: Compile your findings into a structured response that includes: > - Names of verified establishments or events. > - **Brief Descriptions**: Overview of each option. > - Service Hours or event duration. > - **User Reviews**: Average ratings from platforms like Google, TripAdvisor, or The Fork. > > #### Important Instructions: > - **Accuracy**: Ensure all information is current and verified. Prioritize real data over generating unverified content. > - **Clarity**: Break down the response into clearly defined sections. > - **Contextual Relevance**: Keep the response focused on Málaga-specific options. > - **No Data Available**: Clearly state if no verified information is found rather than offering potential but unverified details. > > #### Sources for Verification: > - Google, TripAdvisor, The Fork > - Local websites and event calendars like [Málaga Cultural Agenda](https://www.agendaculturalmalaga.com) and [La Opinión de Málaga](https://www.laopiniondemalaga.es) > > #### Conclusion: > Review these recommendations and let me know if you need further assistance or additional details on any specific location or event.
It would be easier if I could set the websites where it has to search like in Botpress and Dante AI but the cost is too much for a B2C production solution so still looking for balance between quality and accuracy of answers and costs 🙂
q
Let's make it our holy mission to solve this: "Start to be frustrating because even with easy tasks like: "tell me ice cream shops in the center of malaga" the results I get are not 100% real." Solving that so you can say, "Now it works perfectly!" would benefit so many bot builders at once.
Have you tried using Cohere or Llama 3 together with Serp API? I've used it in some of my Botpress and other projects. You can use it to perform searches that focus on specific websites by using the site search operator in your queries.
Copy code
js
malaga ice cream shops site:malaga-tourist-info.com
a
No, I haven't tried it. Can you add more than one site? Because Cohere let you add only one
q
Copy code
js
malaga ice cream shops site:malaga-tourist-info.com OR site:icecream-malaga.com
I haven't tried with that OR operator yet, I just found it when you asked.
a
I could test it but I would need more request per month. If it works fine I supose I could talk with sales team and negociate https://cdn.discordapp.com/attachments/1230732214891712674/1231562068579319890/image.png?ex=6637688e&is=6624f38e&hm=bc231e46a38fcb6df6a440fbc5ddc69728dc9c96b1f609204b0ea71223e0b1b9&
q
Yes, it comes back to accuracy vs. cost.
a
Maybe it's a matter of prompt chaining. I mean most of LLMs can scrape web content. So maybe something like: 1) Analize user question 2) Scrape website information from these websites: site1,site2,site3,site4, etc 3) Filter results based on the date of the question. If there is no date, use today as date filter. Never show events, plans or activities with a date before today. 4) Extract relevant info like description, timetables, average user reviews 5) Present the info to the user. I'm thinking if I should create different prompts for each point and test it
q
I have also used Bright Data for example with my little 'YouTube Trending Videos' Botpress project. Good article https://medium.com/nerd-for-tech/5-serp-apis-that-can-beat-google-at-their-own-game-665b86aa822c
c
I think that instead of scraping you could start building your own dataset to use as a solid foundation. You can use webscrapers and other API services to build this database. This is much more secure and will not give these hallucinatory responses LLM's are currently given on real-time data. If you take this route than eventually restaurants will see the benefit and come to you with their data which you can sell trough API. @agreeable-accountant-7248
@quick-musician-29561 can I post in this thread on RAG or is there another topic better suited ?
q
I would be happy to hear and learn from your RAG ideas here also. I don't think we have a better-suited thread for it yet 🫡
c
then let me post this video on RAGs from the Groq team which really showcases why RAG can be beneficial compared to traditional KB: (feel free to migrate this post to another topic)

https://www.youtube.com/watch?v=QE-JoCg98iU

q
Let's watch it now instead of always adding these to our future playlist.
c
yes, currently watching/making notes ha 😛
it seems your initial response about the KB vs Pinecone might be on the park: the techniques are very similar in that we are injecting knowledge before sending it to the LLM. However, I think there is a difference still in that the KB under the hood does not use a Pinecone like structure (I couldn't find any information about this). Maybe an official developer has more information. If not, why does the KB tend to hallucinate so much ? RAG systems are designed to help prevent these.
q
I based my idea that they both work similarly because they both return 'chunks' to us, which is a method used in machine learning, pieces of text or information retrieved from a large knowledge base or database to help generate a response. We can also send these 'chunks' from Botpress to other LLMs like this https://discord.com/channels/1108396290624213082/1229502740854472945 (copied from there:) "You can change the number of chunks (5 in this example) sent to other LLMs so that they have enough related context in this code workflow.chunks = event.kb.results.slice(0, 5).map((a) => a.dsFriendlyName + "\n" + a.content).join("\n\n") Don't send all 50 chunks from the Botpress KB, it costs too much and may not even fit the LLMs' context window. Originally, I built this using the Cohere API and Pinecone vector database, but here we are all about doing things in Botpress." ⬆️ maybe this is another idea you can consider, @agreeable-accountant-7248 Do the web search in Botpress; it can do it well. Then send those chunks to Llama 3 for speed and test the quality of the response (versus doing it all in Botpress, like you have mentioned in other threads that you tested that already earlier).
in terms of searching, a regular database looks for exact matches, while a vector database looks for things that are similar in meaning or use.
f
I haven't tried, but maybe the Google Maps API could work for the locations?
c
"Chunks" in Botpress: When Botpress returns "chunks" of information, this usually refers to segments of preconfigured responses or parts of scripts that are triggered based on user interaction. This is not necessarily indicative of a vector search approach.
q
Yes, I think you are right! 💡 Based on my testing, both work equally well (for coding knowledge bases).
When we make a query to the Botpress KB, it returns 50 chunks, each containing information closest in meaning to the query. Normally, if I ask something from a regular database (like the one I use on Linux), it returns only the exact answer. That's the biggest reason I think it's close to Pinecone vector database (and I thought it's exactly the same technology).
j
Its like general chat in here
"what is the main difference of using Cohere API (RAG specialized AI) compared to using Search the web in Botpress?"
Yes it is, finally! 🛠️ 🔥 🚀
q
when enough bot builders are working on similar projects together, solving issues and building projects, this is what it looks like
j
I can see that...
Im turning off my emojies
its to much for this damn laptop
You nerds enjoy this thread, (ill join you soon)
Its a complement
a
Hi all, sorry for not answering (little kids things :-)). I add something more of detail about what I have tested until now. 1) Knowledbe Base from Raw Input Card is not useful for me because in Spanish (not sure if this happens in other languages) most of the time we don't build a complete question. For example: recommend me japanese restaurants. If there is no question, Knowledge Base is not triggered 2) Query Knowledge Base. This is what I was using in Botpress and it works really nice. This will be mi first aproach for local information (my users add information from places directly from the chatbot so I could ask first to my local knowledge base before using other API calls to external LLMs) 3) My biggest challenge is dynamic events from so many disagregated sources. For example in Malaga there are more than 100 sources to search for plans, activities, etc. Each of them uses their own format to present information so it's very difficult for a LLM get accurate and valuable information. For example. A simple question like this: what can I do this afternoon with kids near the center? implies for LLM: 1) If user says this afternoon first I need to know what day is today. No problem, I can send you the date in the prompt. 2) Near the center implies that LLM should filter by distance so he needs to find location of the events and check if suits user needs. 3) With kids implies that LLM should evaluate description of plans, events and activities and decide if it's suitable for kids And most important thing: I don't want to get results like: You can go to a park, you go for a walk with your kids, and information like this. I need specific plans and activities because I know that those disagregated information sources have specific and particular plans (I have been getting this information by hand for several months) Hope this information can be useful to find best ways to improve LLM capabilities. Thanks all of you for your comments and thoughts.
q
@agreeable-accountant-7248 1) Knowledbe Base from Raw Input Card is not useful for me because in Spanish (not sure if this happens in other languages) most of the time we don't build a complete question. For example: recommend me japanese restaurants. If there is no question, Knowledge Base is not triggered I have noticed the same. I solved it by adding a quick AI Task after the Raw Input, and asked it to always change it to a question. I did something similar here https://discord.com/channels/1108396290624213082/1231084649506148422/1231270538496970812 https://cdn.discordapp.com/attachments/1230732214891712674/1231602964699938816/image.png?ex=66266b24&is=662519a4&hm=7dfd2c73ed7924b8a41adbcf3ba7eb313978edcccdc671a06a1e8be4f47913aa&
c
this is my preferred method as well, there must be a specific prompt engineering term for it ha 😛 Btw for prompt engineering take a look at: https://learnprompting.org/
a
I have seen RAG video above and Llama3 + Pinecode Vector Base could be a potential winner. In the video the author comments that he has scraped 20000 articles from BBC. I could code a python script to scrap, everyday, those websites with info about plans and activities and upload to Pinecode. Then I could ask first to my own data and if nothing is found I coud use directly Llama3 because it's very fast. Any tutorial about uploading data to Pinecode? I have no experience with Vector Base Thanks!
q
@agreeable-accountant-7248 For example. A simple question like this: what can I do this afternoon with kids near the center? implies for LLM: 1) If user says this afternoon first I need to know what day is today. No problem, I can send you the date in the prompt. 2) Near the center implies that LLM should filter by distance so he needs to find location of the events and check if suits user needs. 3) With kids implies that LLM should evaluate description of plans, events and activities and decide if it's suitable for kids My thoughts: Assuming that here we can use Llama 3 and it gives quality responses, and being so fast it can be prompted even 3-4 times and it takes only the same time as using GPT-4 once, I would also try prompt chaining as you mentioned earlier. My ideas, which I can try and test to get AI to reply only with real results, not hallucinating or giving fake or too general results (this is close to what you have already said): (always send the response from the previous prompt (API call like Llama 3 or AI Generate Text or AI Task) to be used in the next prompt) First prompt to get all the results from a web search. Second prompt to double-check that the day is correct (today), the location is near the city, and the activity is suited for families with kids. Return only if all those requirements match. Third prompt to instruct it again to offer only places that have specific and particular plans, instead of only general recommendations. (not necessary, but if nothing else works and the LLM is really really fast, maybe even a fourth prompt could be used to double-check that the response after the third prompt really answers the user's question perfectly, including all the necessary elements, and then send the response to the user)
One useful Pinecone tutorial I can recommend, short, sweet, and to the point:

https://www.youtube.com/watch?v=z-_KHGkljkc

All those Pinecone videos are between 5 and 10 minutes long.
a
interesting approach. I will try something with prompt chaining with Llama 3. I have done a quick test with Botpress Website Search (100 urls) and it works really good as it only search in those websites but 0,09$ the answer. My actual spend per month is aprox 142$ for 24k user questions. With the cost of Botpress answers my spend would be 2,279$... 🤯 😱
q
I believe that everyone here appreciates your willingness to share your real testing experiences with a use case like this, as well as the actual costs involved with such a large number of users and queries. I'm sure that together we'll find the best solution 🦾 🦸🏻‍♂️ 🫡
@agreeable-accountant-7248 This is something that I have mentioned here too many times before, but I just try to be useful by providing some background information on how these my solutions work in different fields. This is something I did with 18,000 blockchain users and autostaking, and it cut the costs for the company from hundreds of dollars to only a few dollars a day. I was using blockchain plus Linux servers instead of AI and vector databases, but I think you can do something similar here. Instead of fetching the entire blockchain and 18,000 accounts every time, I did it only once a day, automatically every 24 hours, and stored that data. Then, when we needed to count how much to pay daily to every account, we only fetched the much smaller one-day data instead of the entire blockchain. So in your use case, I don't know exactly what kind of activities users are looking for, but can you do a web search for all those 100 URLs one time, automatically every 24 hours, collect all useful info, and store that result? In Botpress you can do that by setting a CRON schedule. And when users ask questions, then you have only one place to look for the data, instead of searching 100 URLs every time. The initial web search is slower and larger, but you can perform it every 24 hours at the best time, when no one is asking questions. Then, for all those 24k questions, it's much faster and cheaper since you don’t need to search 100 URLs; you only need to check one place (knowledge base, database, or server URL).
j
Cool
q
Thanks for that image, Remo! 🍭
a
I'm going to test your solution. First make a web search over all the urls, get the information I need, store it into a table inside Botpress and then do a Query Knowledge Base to that table. Let's see what happen 🙂
q
Yesss! 🦾 🛠️ 💎 🙏 💯 I added some text to that my previous message.
Let's hope that it works well, so that you can cut costs significantly. Also, it's better if it already works with Botpress KB tables as we are used to, via cards (Insert Record, Find Records) or via code
Copy code
js
await Data1Table.createRecord()
await DataTable1.findRecords()
If for some reason it doesn't work, Botpress now also has a KB Tables API. https://botpress.com/docs/api-documentation/#tables
And there's always the option to store the data in Google Sheets or Airtable, but let's hope that Botpress KB solves everything.
h
🔃
catching up........
f
Free to use, or where is it from?
I think that the client is a bit easier and better to use than the API 🙂
At least once the docs comes out for it. Right now you have to use the API docs to use the client..
q
The reason for so many (150+ 🤣) messages is that if we can somehow together solve Mister @agreeable-accountant-7248's cost-related issue in his great use case, it really opens up many new possibilities for bot builders to use the same solution with large numbers of users.
f
Local running LLM? I don't think that its the query to the web that is causing the high costs, right?
I just skimmed over the messages, so I could have missed something 🙂
q
@fresh-fireman-491 Great! 🦸‍♂️ 🛠️ 🫡 I was just about to summon you here to see if something could be done even better.
f
1 thing I read was the locations issue
q
I'm guessing Mr. Iñaki is using Llama 3 with the Groq API here, but I could be wrong.
f
It has a limit if I am not mistaken. I don't think it would be suitable for that many users until a proper API comes out.
Its actually not that bad
Mine is 30 request per minute https://console.groq.com/docs/rate-limits
14,400 a day
a
Hi Decay. This could be a good solution for restaurants and local places but for specific plans, events, courses, local and popular partys... that information is not directly in Google. That's because people start to asking us to show that type of information for local people. So I should be able to answer questions like: - tell me a nice japanese restaurant in the center of the city - what can I do with my kids next weekend in a particular town
q
I've also checked those, in my account (free beta) it says 14,400 requests per day
a
yes, I sow the limits. Not for production right now. That's why I'm testing the botpress aproach of storing relevant information from those 100 urls every day and then search on that local knowledge base when user ask something related with plans or activities
f
Ah okay. Is the chatbot just for 1 city or area, or is it global?
a
We have started with one city because we need to secure that the business is scalable and profitable
that's the reason I try to balance the cost and the quality of the information for the users
f
I can think of a few things here, but I think the best one might be to contact a company that stores these events. Here in Denmark we have something called visit. Website for my city: https://www.visitfredericia.com Nearby city: https://www.visitodense.com They have all of the great restaurants, events, must see, places to stay etc... If I would do something like this for Denmark then I would work with them to try and find a solution. Hopefully you have something similar. Edit: Changed the links to be in English
a
Thanks for the info Decay. Here in Spain we have a problem because information is totally disperse. Only in Malaga, where we have started, we have more than 200 websites with local events and plans. Malaga city has 103 municipes and each one of them put their local info into their facebook os instagram page or his particular website. That is the main reason of the startup. We started to join all that information by hand every week and we raise a community of 50k followers, Next step is automatize that work but we know that is a real pain for users and tourists that want to do things or visit places that local people know, not only what google can recommend you. In two weeks our users have added to the bot more than 260 local restaurants with their own opinions and descriptions. This is the type and quality of info I would like to offer to all of the users.
f
Ah okay, I see. That makes it a bit harder. What do you do to verify what the users added?
a
until first investor round comes, I do it by myself 🤣 . Then I would like to use a LLM to validate that the information is real and confident.
c
It was from a cool medium.com article , Groq and Llama, but it was behind a paywall urgh
That is a great initiate Inaki, i’m not surprised about its success because you are solving a real issue here. Did you look at my suggestion earlier about building your own database in favor of live webscraping ?
a
Morning Remo. I should try to do something similar to your suggestion. Do you know if is there any AI for web scraping that can get content based on natural language? Because I have so many websites to scrap and each one uses different tags and formats that it would be tough to get a web srcapping code that works with all of them. Thanks in advance!
c
Morning Inaki, the weather must be amazing in Malaga 🌍 Can’t wait to travel again to Spain/Portugal and get some actual sun ☀️ For the webscraper : - you can use a scraping service like webscraper.io that lets you scrape hundreds of sites per month for a decent price. It does not require code but still takes some time to setup properly. Alternatively, is to learn a bit of coding so you can really automate the process the best. There might be additional steps needed to validate data and “parse” it into different data formats .

https://youtu.be/76gF-V1k7JE?si=Je8wl51k2Ie4lN6J

j
Okay, quick question, what is this thread about ?
thinking of possible ways Liama can be used ?
f
LLaMa and Iñakis problem
a
Main topic has evolved to a real-time web searching RAG LLM
q
You described it well too. I think the Llama 3 tutorial started after message number 40 when Takudzwa Billions found the correct API for us, and the tutorial part lasted only 10 messages. Before that, it was about practicing to use Llama 3, and after that, exploring use cases where we need the fastest LLM.
j
okay cool
q
⬆️ title text cannot be longer, otherwise I would have added "...solve many other issues in your project along the way (and create a few new ones)"
l
Would you be able to talk a little more about how you captured the chat history and sent it back to Llama 3 so that it has memory of the conversation?
a
Sure!! Now I’m leaving for an event but tomorrow I will show you the code for chat history with Llama3 (it s almost the same than I used with Perplexity)
l
You are awesome. Thank you so much!
q
@agreeable-accountant-7248 One more option for us to consider with Botpress is OpenAI Assistants API v2, now that it seems to be much better, much faster, and also much cheaper 🔥 I haven’t tested it a lot yet, but I just started https://discord.com/channels/1108396290624213082/1232193993836855347
a
Hi. This is the code that I use to give chat history to Llama 3 and other similar LLMs: The structure must always alternate user question and assitat like this: - User - System - User - Assitant - User - Assitant
and my code: > const GROQ_API_KEY = env.GROQ_API_KEY > > const productInfo = [] // Here goes all messages between user and the LLM > > workflow.preguntaPerplex = event.payload.text //here goes the prompt construction with the user query > > workflow.arrayPreguntasUser.push(workflow.preguntaPerplex) > > const systemPerplex = > { > role: 'system', content: 'the role of the assitant.' > } > > productInfo.push(systemPerplex) > > if (workflow.arrayPreguntasUser.length == 1) > { > const chatCohereUser = > { > role: 'user', > content: workflow.preguntaPerplex > } > productInfo.push(chatCohereUser) > } > else > { > workflow.arrayRespuestasChatbot.push(workflow.respuestaIA2) > for (let i = 0; i < workflow.arrayPreguntasUser.length; i++) { > > const chatCohereUser = > { > role: 'user', > content: workflow.arrayPreguntasUser[i] > } > productInfo.push(chatCohereUser) > > if (i < workflow.arrayRespuestasChatbot.length) > { > const chatCohereChatbot = > { > role: 'assistant', > content: workflow.arrayRespuestasChatbot[i] > } > productInfo.push(chatCohereChatbot) > } > > const data = { > messages: productInfo, > model: 'llama2-70b-4096', > temperature: 0, > max_tokens:10000 > }; > > const main = async () => { > try { > const response = await axios.post('https://api.groq.com/openai/v1/chat/completions', data, { > headers: { > Authorization:
Bearer ${GROQ_API_KEY}
, > 'Content-Type': 'application/json' > } > }); > > workflow.respuestaIA2 = response.data.choices[0].message.content > return workflow.respuestaIA2; > } catch (error) { > console.error(error); > throw error; > } > }; > > > } > > } I'm sure that it's not best code ever but I make so many test that I need to go fast :-p Feel free to ask me anything!
thanks for the info, I will review the post!!
l
That's SUPER cool. Thank you!
q
"Do you know if is there any AI for web scraping" Take 22 seconds to see how Browse AI can save you hundreds of hours.

https://www.youtube.com/watch?v=tXdp4m3y5KU

@agreeable-accountant-7248 Check this link, it's one more idea on how to ALWAYS get only the correct data from websites to Botpress Knowledge Bases https://discord.com/channels/1108396290624213082/1223909098974875658 Just build it once to work correctly with all the required websites, and then automate it to perform the task and collect information from there every 24 hours. Some good reasons to use Node libraries like Puppeteer with Botpress: with it you have total control of the info you need to provide the users from the website, and you have control when the info is scraped from the webpage. 1. We don't need any external libraries to build really good chatbots for many real use cases that companies and clients need. But some use cases might need more functionality, and then we can use NPM packages and leverage the work of 17 million developers to build even more advanced Botpress chatbots. 2. You have total control of the info you need to provide the users from the website. You can decide what information you need to get from websites; in the first example, I used headings (titles) and paragraphs (text) and then the chatbot summarizes those to the user. In the second example using YouTube search, I took the video title, channel name, image URL, and link to the video, then displayed those to the user. Also, you have control when the info is scraped from the webpage, for example, every time when someone uses the chatbot (so the information is not hours or days old). 3. You can let users add webpages to the Knowledge Base. You can ask users to provide a webpage, then scrape the webpage data and store the result in the KB table. Then ask the user if they want to add more webpages (maybe news, research, etc.) or if they want to start asking questions from the new info (not only summarize it but ask all questions related to that).
4. You can do many of the same things with APIs also, but many times they cost money. Web scraping and the usage of other external Node libraries are free with serverless functions (125k function calls and 100 hours of execute time a month). If your users exceeds those limits, then you'll have enough money to pay for the serverless functions PRO version (2M function calls, 1000 hours of execute time a month, $19 on Netlify).
f

https://youtu.be/ttxA8QjkvCA?si=gJVSyfge5oBXiru9

q
As we expected what's coming: This model, developed by Gradient and sponsored by compute from Crusoe Energy, extends LLama-3 8B’s context length from 8k to over 1,040,000 tokens, allowing it to process significantly longer text passages for better understanding and generation. 🦾 🦙 🛠️ 2 weeks ago: "since they've already solved this fine-tuning part, we can also believe that many AI builders are working hard on techniques for API calls and end users as well, so we don't need to ask questions and request help from '8K bimbo'." link to run it locally: ollama.com/library/llama3-gradient https://cdn.discordapp.com/attachments/1230732214891712674/1235639961878007860/image.png?ex=66351ae3&is=6633c963&hm=0226b3e87ce2643e2e9ec2f1e85d400955b349e5a8e2c87d8497a47f809558cf&
f
Very very cool!! Are there any "needle in a haystack" tests?
q
Yes, it was performing well in those.
from their devs: After released the first LLama-3 8B-Instruct on Thursday with a context length of 262k, we now extended LLama to 1048K / 1048576 tokens onto HuggingFace! This model is a part 2 out of the collab between gradient.ai and crusoe.ai. As many suggested, we also updated the evaluation, using ~900k unique tokens of "war and peace" for the haystack. Also the success of the first model opened up some GPU resources, so we are running training at 512 GPUs now using a derived version of zigzag-flash-ring-attention for training. Link to the model (LLama3-License) and test results: huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k
comments: "Wow what a milestone. Wild what the open-source community can do! This is amazing!"
f
Amazing!!!
q
f
Yeaa
77 Views