mehran behzadi
6 min readJun 21, 2021

--

Artificial Intelligence: Computer Vision

What is computer vision?

If you were asked to describe and identify what you see in this picture, you would say things such as “the sun is shining from the background on the mountains.” and things like that. This seems a simple task for humans or even babies that can accomplish. However, if you think deeply about it, there is a lot more process and understanding, and then ordering the tasks is happening in the background. Human eyes and vision is a very complex system that connects our mental model of objects.

Digital equipment such as cameras are way more capable of human vision and can surpass our limits, and they can also identify each color and distinguish them easily. But to a computer, the pictures are problems that can be solved for them and have a different understanding. To them, a picture is consists of an array of pixels or numerical values that represents colors to identify the picture uniquely and shape it the way it looks in real life. Computer vision is not necessarily about converting pictures into pixels; in fact, you need to understand how the extraction of the information from the pixels and then interpreting them would work.

Neural networks and deep learning are making computer vision more capable of using this ability and replicating human vision, and surpassing human capabilities and limits.

“An artificial neural network (ANN) is the piece of a computing system designed to simulate the way the human brain analyzes and processes information. It is the foundation of artificial intelligence (AI) and solves problems that would prove impossible or difficult by human or statistical standards.”

How does computer vision work?

A computer does not have the same advantages as a human does to understand and interpret an image in the same way. To an algorithm, an image has big integer numerical values in an array which describes the depth of the colors in the spectrum, which represents a pile of data. As technology proceeded, we developed an algorithm that functions similarly to a human brain using machine learning. Machine learning allows us to use the content from the data set so that the algorithm can understand the information in a specific organization and what they represent. Machine learning is more capable of human abilities. It can capture or demonstrate with better accuracy than human vision, as long as we can supply them with enough data. With machine learning, we can teach computers to recognize and interpret images in different sizes, rotations; however, teaching computers requires millions of examples of training data with testing.

Consider the picture below; the way a computer learns and understands the shape of the image is through guessing and trial and error. For example, the computer looks at the image and chooses one of the options, let’s say it chooses triangle, and then the computer realizes the answer is incorrect and the correct answer is shown to it, which in this case is square and then stores the answer in relation to the shape that is represented.

Machine learning finds patterns by trial and error and learning from its mistakes. The training model is responsible for a statistical model that does the guessing for the computer by analyzing and interpreting an image. Guessing shapes are easy tasks, but how does the computer guess and understand the more complex images, like an image of a tire?

The computer breaks down the image into smaller simple patterns. The way the computer identifies the pixels is through the neural network, which is made of many layers. The first layer takes pixels as numerical values in order to identify the edges, then the next layers of neurons use those edges to detect them as simple shapes until the computer puts them all together to understand the image.

Growth of computer vision

Before the arrival of deep learning, most tasks that computer vision was capable of doing were minimal and required lots of coding and effort by human resources. For example, in order to perform facial recognition first, you need to create a database by capturing images of all individual subjects, then you have to implement several key data points like the distance between eyes and dozens of other measurements that uniquely describes each individual, this process is called annotate images. The final step is to capture new images through photographs or video and do the same process again by marking key points.

After taking all these necessary procedures manually, the application would be able to compare the measurements from the new images to the ones stored in its database and tells whether it corresponds to any of the profiles or not. It wasn't very automation involved, and even sometimes, the computer would face a large number of errors.

With machine learning, it provided a different solution to computer vision problems. For instance, there is no longer necessary to code the instructions manually. Instead, you would program smaller applications that could identify certain patterns in images. Statistical learning algorithms such as linear regression, logistic regression, decision trees, or support vector machines were able to detect patterns and identify images in them.

Applications of computer vision

  1. Defect detection

A defect detection system by computer vision has been available for several decades now; however, there is no means of easily controlling the full extent of production. With the use of computer vision, we can detect defects as small as 0.05mm, such as cracks in metal, paint, or improper prints. These cameras require an intelligent algorithm, allowing them to distinguish between what is a defect and what is not a defect.

2. Metrology

Another interesting application is computer vision, an alternative to the complex equipment and probes used in laser metrology. Keeping a good reference is the key to measuring accurately, and light is crucial to working with the correct lighting for every type of material.

3. Intruder detection

With the help of hyperspectral cameras, it’s possible to differentiate between fruits and stones, making food safer for consumers. Through the measurement of wavelength, hyperspectral cameras are able to distinguish between different types of material. As a result, stones can be distinguished from fruit, plastics from metals, and other combinations of materials can be distinguished when the material is different.

4. Assembly verification

The complexity of assemblies is increasing every day, with a greater number of components or connections. The use of computer vision allows us to check step by step whether every component is positioned correctly, or at the end, whether the final assembly is correct. For complex mechanical or electronic assemblies, this program may be beneficial. By using these systems, very complex operations can be performed in significantly reduced cycle times.

5. Screen reader

It is possible that a display screen cannot be analyzed, either because it is a closed-source supplier system or because that system is not compatible with the one in use. This problem can be resolved by installing a computer-vision camera that can read the screen and extract data from it (temperatures, codes, tensions, and anything else you need to know that appears on the screen). By using a character recognition algorithm (OCR), we can extract the relevant information from the interest regions.

https://youtu.be/qelFwuD_Ey4

--

--