Using the new 'Vision Agent' to analyse an image s...
# 🤝help
l
Goal: Get an image sent by a user on WhatsApp to be described by the agent and saved as a variable. The logs are saying that i've got an error in the openAI part (i think) issue id iss_01HVRE0HSMVKFPRE0B14PQV7HW message: An error occurred while executing an agent action: [object Object] (Status Code: 400) I have left the image url blank. I have created a variable for 'store answer in variable' I then have an 'AI task' card to generate some advice based on the response from that variable. Vision agent __has__ proven to work when there is a url in the 'image url' field of the 'Extract content from image' card https://cdn.discordapp.com/attachments/1230521763847209021/1230521763994140702/Screenshot_2024-04-18_at_15.09.04.png?ex=66339fb2&is=66212ab2&hm=4c38860d256fbf31b6c156e7b97b59633b5091aacd3a3f6e244265179acff62d& https://cdn.discordapp.com/attachments/1230521763847209021/1230521764304654408/Screenshot_2024-04-18_at_15.10.48.png?ex=66339fb2&is=66212ab2&hm=d409629007db45077f71f793861ef91c4dc4c7645715c93b1bfc5abb945153ac&
f
It can't run with the image url as blank
It needs an image to analyse
l
Sylvain mentioned "If the incoming message is of type image, then the image will be extracted automatically. It also supports extracting from links" If this isn't the case, is there a way to somehow generate a link from the users whatsapp message/input maybe?
j
Hey man
you need to save the url into a variable
and put it in the image url box
example: bot: what image can i look at today ? user: example image.com Bot: saves image URL into variable Bot: provides info about that image
l
Ok looks like the answer is in there, thank you. My no code status is about to get tested 😂 Thanks Decay
I think you mean what Decay has sent right? Creating a variable from a users input?
f
Try and console.log the link that you are extracting
Might be the issue
j
Yeah
l
f
Try with just the URL
Maybe @bumpy-addition-21507 or @limited-pencil-78283 could help here
b
Tbh, @fresh-fireman-491 is right on the money here. That should work
f
Amazing! Thank you
l
So just... https://lookaside.fbsbx.com/ or will I need the brackets too?? The original one was: {"imageUrl": "https://lookaside.fbsbx.com/...."}
f
Neither should work
You need the last partr
You can get it by this
Copy code
javascript
const whatsappAccessToken = env.WHATSAPP_ACCESS_TOKEN
 
const res = await axios.get(event.payload.imageUrl, {
  headers: {
    Authorization: `Bearer ${whatsappAccessToken}`,
  },
})
 
// This will be a JavaScript Buffer (https://nodejs.org/api/buffer.html) containing the raw binary content of the media file.
const rawFileContent = res.data
 
// This will indicate the file type, see:
// https://developers.facebook.com/docs/whatsapp/cloud-api/reference/media/#supported-media-types
const mimeType = res.headers['content-type']
l
ok, created the Configuration Variable named "WHATSAPP_ACCESS_TOKEN" in the bot settings. Do I need to enter this code somewhere also? Sorry dude. I don't envy you having to deal with no-code warriors like me. I appreciate the pain this must cause
f
No worries at all. I haven't really used WhatsApp that much, that is why I pinged BattleSynth and Lijo. I was hoping that one of them could help use here
l
No worries. I guess there will be a way to change that url entry part to be whatever the user sends in. I doubt it would be made so that you have to have the same url in place for every user
l
@fresh-fireman-491 - haven't read the thread yet, will come back in sometime
Which card is this?
l
Ah I have never used it
l
Yea I think it's pretty new. If I can get it to work it'll open a ton of opportunity. I don't think there is any other way for me to allow an AI to analyse the contents of an image which a WhatsApp user sends in. It looks like this card could do it. We are nearly there. I think it's an issue with the url part. Sylvain did say that can be blank earlier on though
j
Whats qwrong ?
f
Hey there Theo 👋 It's with WhatsApp You can read about it here in this post
j
oh
h
Hey there @limited-library-71452 . I have successfully integrated vision to my WhatsApp but I am using Claude instead of OpenAI. I used a thread from Decay in tutorials titled How to use Claude 3 in Botpress or something like that
are you able to extract the base64 string and file type from your url?
Here is a sample workflow and the code You listen to input by using event.type === image, in that (user sends image expression card) and if it is an image then you send it to the vision workflow where you can extract the base64string and the image type. Once you have those saved in variables then you can post them to the Vision API in that second card in the Vision node and display response from the API in that text card. I am also no code and it took me a month to force myself to understand what is happening here. You are on the right path you will crack it, just continue trying. You will learn more in the process https://cdn.discordapp.com/attachments/1230521763847209021/1230791169957691503/Screenshot_1709.png?ex=66349a99&is=66222599&hm=4438cedb8f3ee338bcd1d2ea22350f269d6413b526e79445092c1944a12867f7& https://cdn.discordapp.com/attachments/1230521763847209021/1230791170230325270/Screenshot_1710.png?ex=66349a99&is=66222599&hm=47cb69de837436356bda87725b0b77fac850c605474c6190e011b16c9e01a434&
l
Thanks so much for this info Takudzwa. Can’t wait to try this out. This will be huge if I can get it going 💪
Appreciate the help already but coudl I ask one more question maybe. Can I confirm that you didnt actaully need/use the 'extract content from image' card. You just sent the image the user sends in via whatsapp, direct to cluade/openAI via an execute code card instead ?
I'm a little behind you in this part. You learned a lot in that month I think! 🧠
h
I think this card just got released recently and you are the first person I am seeing talking about this card.
Please see the code i Shared above Your image content will be in the imageUrl so you can get your image content there (there will be no need for the extract content card. )
l
Ok understood. Thank you. Yea not sure i'm meant to have it tbh. It doesn't seem to do what I need as I need to generate the url on the fly as the user sends an image in. I'll experiment with the code. Thank you again.
This coding stuff is hard. Which is the reason I came to botpress in the first place 😂 I have temporarily attempted to go back to the 'easy' method of using this new 'Extract Content from Image" card, and keep getting this error. To save me from continuing down the learning code from scratch route has anyone sucessfuly used this card before? (without having a url to use in the card's standard fields) https://cdn.discordapp.com/attachments/1230521763847209021/1231975920525512724/Screenshot_2024-04-22_at_15.28.29.png?ex=6627c67c&is=662674fc&hm=ee48a0e6f19d94e930435f39ccd943534f5cd36774d3f59c9149abdbacd5190d&
h
hey bro how is it going now? Any success?
l
Hey dude, not quite yet no. My coding knowledge is almost zero so I had to continue trying to use the "extract content from image card". The vision workflow always seems to fail without the url in the card though. I tried variables and tried to extract a url from the whatsapp image. Also now trying to use Zapier to do the image analysis, which it does well, and webhook from botpress to zapier and back to botpress for that part. Getting the image file/info/url from whatsapp is tricky still. I will maybe need to learn more/some code, seems like javascript, first
I will not let this beat me though!
h
Seems like you are going through exactly how I went through this. It literally took me a month to have this return a successful response but I also learnt loads of stuff by the trial and error. Keep up trying you will crack it and will be a better by the time you are done.
I will also try to test the "extract content from image card" this weekend and will let you know if learn something from it
lets see a screenshot of your code? The one you are using to read the imageUrl
l
Any luck? I can set up a variable for the url and use the variable in the url field of the card. If I prefill the variable then the AI does understand and does extract the content from the image. BUT I still can't get the variable filled with the url of the incoming message image. I've been using ChatGPT to help with code and I actually think it's doing it in a different language. To find the WhatsApp image url and then save it as the variable @workflow.testimage It gave me this... Which even I can tell is way off...
h
Hey bro, still haven't managed to check out the card For code can you try using Claude. Haven't been using ChatGPT in a while but I believe you can do better with Claude 3 Explain to it your whole scenario and give it the code you currently have and ask it to improve it
Also the code I attached here you can copy it as it is and just activate the variables in your account and I am sure you can get some progress
l
Dude, i've done some pretty extensive work with Claude and ChatGPT now. Still no luck i'm afraid. They've iterated loads on two code blocks. One to extract mediaID and the other fetch the URL. They seem to think it's all about this error: "An error occured while running Execute Code card: temp is not defined" Don't suppose you (or anyone else) knows of anyone that making use of this "extract content from image" card yet do you?
f
Use bot.WHAT..... for the token
??
Ok it looks like the Access token is ok and we are generating a URL. It's just not saving to the variable required. Is this enough to save the URL as the variable @workflow.testimage for use in the "Extract content from image" card? Seems like there should be more
This is an example image URL from the incoming whatsapp message https://lookaside.fbsbx.com/whatsapp_business/attachments/?mid=1148705659495348&ext=1714655238&hash=ATuiSTp7GKQDuhLNbEqJH1XgenQMd3dNF9DGu2xKgT9OCQ Is this URL even able to be analysed via the "Extract content from image" card. Is it public. I think you mentioned before that they have to be a *public *URL
inserting it into a browser leads to a Meta error message
h
Hey bro, great progress if you can get the url I think now the next thing will be able to extract file type from the url and also turn the data in a base64String Once you can extract those then you have all you need to make an API call to Claude
l
Thanks. I really want to try and use the "extract content from image" card ideally. Feels like that would be simpler. Less that can go wrong. Although I can't actually get it to work yet 😂
h
You really should continue with the card and be successful so that you can post a tutorial for all of us to learn how to use the card
l
😂 i'll keep trying. I think the issue is currently that the links created by a WhatsApp image message are not 'public'. The URLs for use in the "extract content from image" card need to be public apparently. If that is the case then my use case is impossible in Botpress.
f
Hey there I am probably starting a new project soon where I will need a file from WhatsApp. I am saving this post so I can come back if I start the project and figure it out 🙂
l
Awesome. Really appreciate it Decay. There will be an answer 🙌
f
Could be
h
hey Michael
how has it been going? any progress
l
Well I thought it was over because of the end to end encryption of WhatsApp BUT The latest is that I used Postman to test if a GET request thing was even able to actually 'see' the image URL of an incoming WhatsApp image. And it DID when sent with the authorisation token. It shows up in the body of a response. So I just need to somehow have that GET thing sent to an AI API and work out how to make it see the body of the response and then find out how I can see the result of what it sees, then it might work. Also found out that those wahtsapp image URL's potentially only last for 5 mins. Ive randomly been banned from Claude unfortunately whilst testing it. Not sure why.
c
hi @limited-library-71452
any progress, im having the same problem that you
hi can you share the remix of this flow?
3 Views