Analyze Images with Python & Computer Vision Service in Azure

8 min readSep 4, 2021

In this article, we will see how python + Azure Computer Vision services help us to analyze images and extract information.

What is Computer Vision?

If I start with one of the main keywords for this blog, Computer Vision. In simple terms, it means that simply give the vision as same as a human vision to the computers. If we go to the formal definition, IBM says it is a branch of Artificial intelligence that enables computers and information systems to derive meaningful information from digital images, videos, and other visual inputs. Based on that information, as users, we can take actions or make decisions, recommendations about the data that we have. Since AI enables computers to think, Computer Vision enables them to see, observe and understand. It is one of the prominent research topics in computer science in the last few decades. It has a long history of around 60–70 years. In this blog, we are going to explore how Computer Vision cognitive service in Microsoft Azure can use to Analyze Images.

Why Python?

Al & ML projects have differed from the traditional projects which need stability, flexibility and have tools for the necessity of deep research. Python offers all of these options such as extensive selection of libraries and frameworks, high-level and concise syntaxes, ease of use, flexibility. That’s why we see lots of AI projects in Python. If you are not familiar with python control statements, loops, string manipulation, python lists, functions. It may be a little hard for you to get started and understand the code snippets.

Azure Cognitive Services

Azure Cognitive Services is a set of cloud-based fundamental building blocks for AI provided by Microsoft Azure which we can integrate for our applications. Basic knowledge of azure cognitive service is crucial including how to provision, secure and deploy cognitive services to build intelligent solutions. But the required sections are covered in the latter of the article.

These are the offered prebuild AI services categories and integrated services in each category by Azure Cognitive services,

Azure Computer Vision service

The primary intention of the Azure Computer Vision service is to help the users to extract information from images, videos. It included pre-built machine learning models to analyze images. It provides the following use cases,

· Description and tag generation — determining an appropriate caption for an image, and identifying relevant “tags” that can be used as keywords to indicate its subject.

· Object detection — detecting the presence and location of specific objects within the image.

· Face detection — detecting the presence, location, and features of human faces in the image.

· Image metadata, color, and type analysis — determining the format and size of an image, its dominant color palette, and whether it contains clip art.

· Category identification — identifying an appropriate categorization for the image, and if it contains any known celebrities or landmarks.

· Brand detection — detecting the presence of any known brands or logos.

· Moderation rating — determine if the image includes any adult or violent content.

· Optical character recognition — reading text in the image.

· Smart thumbnail generation — identifying the main region of interest in the image to create a smaller “thumbnail” version.

In this blog, we will focus on image analysis and thumbnail generation capabilities of computer vision.

Provision of the required resources

Before moving to the hands-on part, you need a Microsoft Azure account with a Microsoft Azure subscription. If you don’t already have one, you can sign up for a free trial at https://azure.com/free.

Free Tip

You can use azure computer vision resources as a single service for this application or you can use the computer vision API in a multi-service cognitive service resource. It supports multiple different cognitive services. By using Cognitive Service, it enables the creation of Text Analytics, Computer Vision and Speech, and other services as a single resource. I recommend to use cognitive services for high AI-intensive applications where you need a variety of cognitive services. But, for this application, computer vision service is quiet enough.

Step 1 — Go to the Azure Portal and Search for Computer Vision Service/Cognitive Service

I’ll create a Computer Vision service since this application only needs a computer vision service for our implementation. But you can provision an Azure Cognitive service as well.

Step 2 — Click the + Create a resource button

As on the previous image, to provision a computer vision service you need to click on create computer vision button. It will redirect to the following page.

After that, Include the project details by filling up the required fields and make sure to add a unique name for the instance. Plus, you can use any resource group you need. If you haven't a resource group yet, create a new group by clicking Create new option. After that, the page will look like this.

Then, press the Review + create button with providing default settings for other parameters & wait until the resource is allocating by Azure and click the go to resource button. Then you have successfully created your Azure computer vision service resource and you are good to go to step 3.

Step 3 — Collect your Keys and Endpoints from your computer vision resource page.

To consume the service through the endpoint, applications require the details of endpoint URI, subscription key, and resource location. That is the main purpose of gathering the keys.

Setup the Visual Studio Code

For this application, we use Python as our programming language as to our previous explanation and VS Code as our integrated development environment. First things first you need to install the required packages for our demonstration of Analyzing images. Let’s see how

1. Start the VS Code

2. Clone the repository from https://github.com/MicrosoftLearning/mslearn-ai900 using git bash or using GitHub desktop or any other familiar method

3. After cloning, open that folder using VS Code and open the file called 01-Image Analysis with Computer Vision.ipynb

4. Install the required packages using the following command before proceeding into the next sections.

pip install azure-cognitiveservices-vision-computervision==0.7.0

Start Working on the Code

Step 1 — Paste your key and endpoints to the respective parameters

Run the first code segment. After running this snippet, you will get a string output with your key and endpoint

Step 2 — Analyze an image using the computer vision service

Analyze an image

In order to Analyze an image, Cognitive services inherently provide REST application programming interfaces (APIs) that client applications can use to consume services. you can use the REST method or the equivalent method in the SDK for the python programming language or any other programming language, specifying the visual features you want to include in the analysis (and if you select categories, whether or not to include details of celebrities or landmarks). This method returns a JSON document containing the requested information. You can define image features, including the image description, tags, and objects in the image. The default JSON file has the following format.

Python Code to Analyze an image

For this application, I have used a default image called store_cam1.jpg provided by Microsoft Learn but you can use any image you want. Make sure to set the correct path before including the asset. The output will look like this.

Let’s try another image to analyze, this image is a construction site, and will see how computer vision service will identify this image.

It gives accurate results with a confidence level between 0–100. In this example, it has a 41.31% of confidence level. It means our cognitive service gave results as we expected.

Step 3 — Analyze Image Features

Let’s analyze previously used images to generate descriptive captions for those images. The Computer Vision service provides analysis capabilities that can extract detailed information of locations of common types of object details, location of the approximate age of human faces in the image, Whether the image contains any ‘adult’, ‘racy’, or ‘gory’ content, Relevant tags that could be associated with the image in a database to make it easy to find.

The code snippet we are using for this task is provided below,

And it outputs the following results,

The first image will give identified objects by drawing bounding boxes. Moreover the Ratings and Tags it has identified. Quiet Amazing yah.

That concludes our implementation and hope you too end up with these results. Don’t forget to trail and error different images and compare the outputs.

Conclusion & Future Learning

In this blog, we have looked at how Azure Computer Vision Service extracts information from images and analyzes it by Ratings and Tags. To learn more about Computer Vision service, refer to the Computer Vision documentation from here. [Documentation]

Use can try out other SDK instead of Python to implement AI applications.

Thank You!