Solve Coding Challenges
# đź“–tutorials
q
Solve Coding Challenges Chatbot workflow 1. Get a coding challenge from the chatbot user. 2. Attempt to solve the challenge with GPT-4. 3. Send the solution to other LLMs (Claude-3, Mixtral-8x7b, and Llama2-70b) for improvements. 4. Ask GPT-4 to rate the different solutions (1-5 stars). 5. Display all solutions (original and 4 improved solutions) and their ratings to the chatbot user. Show the project to the client and repeat the process from steps 1-5 until the deal is secured. I'll add all the code, prompts, etc., and upload the bot file after I have tested enough. I'm going to build it so that when other bot builders are testing or start using it, just replace your API keys and you're good to go. Test it by solving coding challenges from edabit.com (10,000+). If the best solution is not working, try other solutions or send the solution with the error message and ask LLMs again to solve that (and provide the next 5 solutions), now that they know the error. Most LLMs can solve up to 'Hard', but not 'Very Hard' and 'Expert' level coding challenges. If a multi-LLM chatbot can solve those as well, we all have clients.
@glamorous-guitar-39983 and others who love to solve things with code, even though it's much easier to just hand it off to AI Task: Here's why I'm trying to build a super coding assistant for bot builders
Coding challenge The Fiscal Code Each person in Italy has an unique identifying ID code issued by the national tax office after the birth registration: the Fiscal Code (Codice Fiscale). Check the Resources tab for more info on this. Given an object containing the personal data of a person (name, surname, gender and date of birth) return the 11 code characters as a string following these steps... https://cdn.discordapp.com/attachments/1229502623883858022/1229825725834723340/The_Fiscal_Code.txt?ex=66311776&is=661ea276&hm=3d3ffd2e48dfe95244767b7ebf71cf8234673c76f1eef3ac4f51d8d4bcb8689e&
GPT-3.5 (the normal ChatGPT everyone uses) couldn't solve it with just one ask. It might fix the code if the user provides all error messages and clear instructions, or maybe not. https://cdn.discordapp.com/attachments/1229502623883858022/1229826232309256212/gpt-3.5.png?ex=663117ee&is=661ea2ee&hm=28972595e69e15a6a68773a999faee7a33f66e11536cc6fdedf791c77364bd55&
GPT-4 solved it on the first try (it's much smarter and better at coding since the latest update). Normally, I code all day at work. A year ago, a tough coding problem might have taken me a week to solve through trial and error, searching the internet, etc. Now, it takes only hours, or at most a day, with GPT-4's help. I'm mostly interested in those complex problems involving multiple platforms and libraries, and when I'm facing some issues even GPT-4 can't help and solve alone, and I'm trying to solve those with this tool. I first try to solve the problems I couldn't fix with GPT-4, then pass the problem and solution to Mistral 8x7b for verification. After that, GPT-4 considers any correction suggestions, makes changes if needed, and this process repeats with Llama2-70b and Claude-3. The final corrections are always made with GPT-4 if possible, and then I send the answer back to the chatbot user. In the end, I might have four different solutions: the original from GPT-4 and three improved versions, each rated (1-5 stars) on how good AI thinks they are compared to each other. I then test all four solutions. If the solution works, I move on; if not, I send the solution back to the Coding Agents with error messages for another round, now with more info on what’s not working.
You can of course do the same by continually sending the solution with error messages to GPT-4 until it finally cracks it. But I also want to test this here: if using multiple different LLMs together to correct each other’s solutions is the ultimate fix (as studies suggest). And even the cheapest models, like open-source models and GPT-3.5 used together like this, can outperform the latest models used alone.
GPT-4's latest update is really good at solving those. We have to keep in mind that if those same coding challenges were included in its training data, then it solves them easily, and that’s really good for us. But it doesn’t even have to be an 'Expert' level coding challenge if it’s something from your own project and not from public coding challenges, sometimes even 'Medium' level tasks are hard for both GPT-3.5 and GPT-4, or maybe I just can’t give good enough instructions. In those cases, I’d like to see if multiple LLMs working together can also solve them easily. And here, I’m not just giving the same problem to GPT-4, Mistral 8x7b, Llama2-70b, and Claude-3 and asking them to solve it and comparing the results. Instead, I give it first to GPT-4, then the next LLMs always trys to correct and improve the previous LLMs solutions, and then I compare all those. It gives much better results this way than every LLM trying to solve it from scratch.
It took 3 hours, but I finally found an Expert level coding challenge that even GPT-4 couldn’t solve, not even after trying a couple of times and with a few error messages to help. This is the first one I can try and start testing this Coding Assistant. https://edabit.com/challenge/ZLTwdq8n5HK7DP9Eq https://cdn.discordapp.com/attachments/1229502623883858022/1229854310712606780/image.png?ex=66313215&is=661ebd15&hm=e2edb6af1e42ea7858314cbcf9349c2fe961f5f54c16e098c5d788b7c17491c9&
⬆️ Botpress Coding Assistant solved that using only GPT-4, and only one time with the correct system prompt (not just for this code challenge, but for coding in general). Let’s try to find a few more examples where GPT-3.5 and GPT-4 fail, but using multiple LLMs solves the issue. https://cdn.discordapp.com/attachments/1229502623883858022/1230183425202192454/Screenshot_from_2024-04-17_18-40-21.png?ex=66326498&is=661fef98&hm=2e3abc9b529c5eeace676d80b9c963c6a34579802bbcd1955b5da027c1ed8109&
g
Maybe thats why I have an hard time with it when asking about code lol
q
Yes, that’s exactly why we need to build these tools for bot builders 🛠️