The Google Suite offers a wide variety of services that you can access through their comprehensive APIs. You may find a full list of all the available Google APIs here:
This article is meant to help you get started working with the Google Cloud Vision API using the REST action in Foxtrot. Learning how to utilize the REST action in Foxtrot can enable you to integrate with third-party services allowing you to perform very powerful and advanced actions such as image analysis, email automation, etc.
The Cloud Vision API enables you to understand the content of an image by encapsulating powerful machine learning models via REST. It quickly classifies images into thousands of categories (such as, “sailboat”), detects individual objects and faces within images, and reads printed words contained within images.
To read more about the Cloud Vision API, explore this:
IMPORTANT: The Cloud Vision API is free until a certain limit of usage. Make sure to read the pricing information in the following article. To use the Cloud Vision API, you will have to set up a billing account at Google, which is completely independent of Foxtrot.
Also, it is important that you carefully read the data usage FAQ to fully understand how Google handle the data and files that you upload while utilizing their Cloud Vision API. Again, this is completely independent of Foxtrot as Foxtrot is only facilitating the communication with the Cloud Vision API.
IMPORTANT: If you are working with PDF files, you can use the "Poppler Utility Library", please reference below article, to convert your PDF files to images in order to be able to perform OCR using the Cloud Vision API:
Setting up the API project
Google also made this guide on how to enable Cloud Vision API.
The very first step is to go to the following website, make sure that you are signed in to the correct Google account: https://console.developers.google.com/
Here, select an existing project or click to create a new project. Give the new project an appropriate name like "Foxtrot Automation". After creating the project, you need to make sure that billing is enabled for your project. This is required in order to use Cloud Vision API. Remember, you will not pay unless you perform a certain number of requests. Read more on how to enable billing in the below link.
When the project is created, click “Enable APIs and services” and enable Cloud Vision. At this step, if you would like to work with any other APIs or services, you can enable them as well.
Generate API key
Now, you should have a project with billing enabled and the Cloud Vision API enabled. It is now time to generate an API key. This will be the key that you will use to authenticate your REST calls. To generate an API key, go to Credentials and click to greate an API key.
To better understand how to use API keys, read this article.
Give the API key an appropriate name. It is up to you to decide the type of restriction (if any) you wish to setup. For the first round of testing, you may decide to not set up any restrictions to get started quickly. You can always go back to the API key and implement restrictions. Click save to complete generating the API key. On the Credentials page, you will now see an API key. This will be the key that you will use to authenticate any REST calls in Foxtrot.
Use the Cloud Vision API
Before heading into Foxtrot, this link is relevant for ongoing reference on the different available request types for the Cloud Vision API
Everything explained in the article and the images provided can be downloaded at the very end of the article. You are now ready to utilize the Cloud Vision API. Start by opening Foxtrot and generate the following variables:
- APIKey
- RequestBody
- RequestURL
- ImagePath
- Output
- Status
Input the API key generated in the Google console in the APIkey variable. It is now time to perform the first request. To get an overview of how this works, read this article.
Image labelling
For this first example, we will analyze an image. As explained in above link, there are a few ways of uploading an image. When working with Foxtrot, you will typically have the image saved somewhere on the machine or in a network folder. Therefore, we will use this method for this first example. Let us start with this fun image of an animal, a goat.
You can download the image file at the very end of the article. You can also decide to find your own image of an animal. Because, for this first example, we will use the feature called Label Detection to have Google analyse the image and tell us what it is (what animal it is). You can read more about the Label Detection here.
Make sure to save the image file somewhere appropriate on your PC. In Foxtrot, set the value of the ImagePath variable to be the path to the image file.
It is now time to construct the RequestBody variable. The process of providing a locally stored image file via API is quite complicated, luckily, Foxtrot offers a build in feature to do this very easily for you. To read more about the process of converting images with Base64 Encoding, read the following article.
To do this in Foxtrot in the REST action, you simply have to surround your image path in the request body with "{EncodeBase64*}" and "{*EncodeBase64}". Then Foxtrot will do all the magic for you! So in this example, you need to set the variable RequestBody to the following.
{ "requests":[ { "image":{ "content": "{EncodeBase64*}[%ImagePath]{*EncodeBase64}" }, "features": [ { "type":"LABEL_DETECTION", "maxResults":1 } ] } ] }
The above RequestBody will take the image file defined in the ImagePath variable, convert the binary image to base64-encoded text, and submit the image to the Cloud Vision API requesting a label detection with one result. The maxResults means that we are only interested in the label that Google defines as the most likely to be correct one.
If you copy-paste the above into your Foxtrot variable, don't worry if it looks like this. It will not create any issues for the request.
The next step is to set the value of the variable RequestURL. This should contain the general string for the Cloud Vision API and reference the APIKey variable containing your API authentication key.
It is now time to make the REST action. Go under Advanced in the action menu and select the action. It should be set up like this.
If all the variables are setup correctly, you can click OK to run the request. If you receive a status 200, the request was performed succesfully. The output of the call should be something like this.
{
"responses": [
{
"labelAnnotations": [
{
"mid": "/m/03fwl",
"description": "Goat",
"score": 0.99238336,
"topicality": 0.99238336
}
]
}
]
}
Hurray! Google can tell us with 99.24% accuracy that the image is of a "Goat". Let's have another go. Let's try with this image file. You can download the image file at the very end of the article.
Make sure to update the ImagePath variable and simply rerun the REST action. The output should be something like this.
{
"responses": [
{
"labelAnnotations": [
{
"mid": "/m/04rky",
"description": "Mammal",
"score": 0.9890478,
"topicality": 0.9890478
}
]
}
]
}
So, Google is telling us that the image is of a "Mammal". If we want Google give additional information/results, let's try to adjust the RequestBody variable with a "maxResults":5 instead of "maxResults":1. Now, rerun the REST action and notice the change in the output.
{
"responses": [
{
"labelAnnotations": [
{
"mid": "/m/04rky",
"description": "Mammal",
"score": 0.9890478,
"topicality": 0.9890478
},
{
"mid": "/m/09686",
"description": "Vertebrate",
"score": 0.9851104,
"topicality": 0.9851104
},
{
"mid": "/m/0306r",
"description": "Fox",
"score": 0.9820934,
"topicality": 0.9820934
},
{
"mid": "/m/02r5q6",
"description": "Red fox",
"score": 0.98200655,
"topicality": 0.98200655
},
{
"mid": "/m/01z5f",
"description": "Canidae",
"score": 0.981378,
"topicality": 0.981378
}
]
}
]
}
Let's try one more image with this Label Detection feature. Instead of an animal, let us try with an image of a robot. You can download the image file at the very end of the article.
Make sure to update the ImagePath variable. Also, let's set the RequestBody back to "maxResults":1. Now, rerun the REST action. The output should be something like this.
{
"responses": [
{
"labelAnnotations": [
{
"mid": "/m/06fgw",
"description": "Robot",
"score": 0.97720194,
"topicality": 0.97720194
}
]
}
]
}
So, even though the image is a drawing, Google successfully labelled with image as "Robot". Hopefully this gives you a good idea of how you can label images.
Text extraction from images
Now, let's try another feature of image analysis. Instead of using the Label Detection, let's use the OCR capabilities to retrieve the text of an image. You can read more about the Text Detection here.
First, let's find an interesting image with some text. You can download the image file at the very end of the article.
Make sure to update the ImagePath variable. To perform OCR, or text detection, we need to adjust the RequestBody variable. Instead of "LABEL_DETECTION", we should use the feature "TEXT_DETECTION". Also, the maxResults parameter is no longer relevant. The RequestBody should be like this.
{ "requests":[ { "image":{ "content": "{EncodeBase64*}[%ImagePath]{*EncodeBase64}" }, "features": [ { "type":"TEXT_DETECTION" } ] } ] }
Now, rerun the REST action. When you perform the Text Detection call, the output is quite long as it offers information about every single detected character. To see the output as one string, go to the very end of the output. It should be something like this. Keep in mind that "\n" means new line.
"text": "Lorem\nipsum dolor sit\namet, consectetuer\nadipiscing elit, sed\ndiam nonummy nibh\neuismod tincidunt ut\nlaoret dolore magna\naliquam erat\nvolutpat. Ut\n"
Let's try this once more with a more challenging image. You can download the image file at the very end of the article.
Make sure to update the ImagePath variable and simply rerun the REST action. The output should be something like this.
"text": "PUSH BUTTON\nFOR\nCROSSWALK\nWARNING\nDEVICE\nCROSS WITH\nCAUTION\n"
Again, the output of the Text Detection is spot on! It's time to make this more challenging for the Cloud Vision API! Let's try to extract text from this image. You can download the image file at the very end of the article.
This should be challenging, the quality is quite poor! When you extract data from images like this, you should use the method "DOCUMENT_TEXT_DETECTION" instead of "TEXT_DETECTION". So, update the RequestBody variable accordingly. This method is appropriate when working with dense document text - including handwriting - in an image. Make sure to update the ImagePath variable and simply rerun the REST action. The output should be something like this.
"text": "Delicacy\nHospitality India Pvt ltd\nNirapet Cross Road. Kukatpally, Hyderabad\nDate : 26/03/2018\nBilled By : Shashikala\nOrder Type : Zomato\nGuest Name : 72829016\nTime: 23:02\nBill No : 77\nTable : 27\nItem Name\nQty\nPrice\nChinese Veg Platter\n1\n339.0\nTotal no. items: 1\n:\nTotal no.qty: 1\nSub Total\nGst 5%\n339\n16.95\nGrand Total\n356.0\nThree Hundred Fifty Six Rupees Only.\nFree Home Delivery : 9133352228/9\nGSTIN : 36AAFCD6456M225\nThank you ! Please visit again.\n"
The output is actually almost perfect. There are a few things that the API get's wrong, for example, "Table : Z 7" is detected as "Table : 27". Another mistake is that "GSTIN : 36AAFCD6456M2Z5" is detected as "GSTIN : 36AAFCD6456M225" - so two mistakes where it confuses "Z" with "2". The guest name is also missing its "\". But the output is still very close to perfect, which is quite impressive.
Let's try one last image analysis, this one will be pure handwriting. You can download the image file at the very end of the article.
Make sure to update the ImagePath variable and simply rerun the REST action. The output should be something like this.
"text": "on the deck\nforgotten coffee mug\nfull of rain\n"
This time, the output is perfect again, meaning that you can (if the image is good enough quality) extract text from images with handwriting. That can come in quite handy!
Comments
0 comments
Please sign in to leave a comment.