Expertise: Computer Vision

Computer vision is a part of Data Science that focuses on creating systems with artificial vision capabilities for understanding visual signals and making decisions based on information extracted from video streams. Generally, Computer Vision systems are artificial eyes and brains which are capable of being managed by many devices and systems.

Softarex has been accumulating experience in Computer Vision for the last 20 years. We've already implemented many Computer Vision systems for different areas and published more than 100 research articles in different scientific studies and blogs. Some of them you can find in our blog.

Problems solved by Computer Vision

A wide range of tasks can be solved using computer vision and deep learning technologies. Here are the main models and problems they solve:

  • Objects detection - single-stage detectors from the YOLO-models family have proven to be accurate and reliable real-time solutions.
  • Objects tracking - The FairMOT model is excellent for Multiple Object Tracking tasks.
  • Instance and Semantic segmentation - ranging from simple models like U-net, to more advanced and sophisticated ones like DeepLabv3+ and HRNet-OCR; Mask R-CNN and DetectoRS are state-of-the-art for Instance segmentation.
  • Image Classification is a well-established task, with MobileNet or EfficientNet models being used if necessary, depending on the target device.
  • Image based regression (prediction of different types of measuring on images: weight, volume, size, key points, and distance) - each task may require the development of a unique model.
Computer Vision
Computer Vision
  • Face detection and face recognition - one of the fastest and most accurate detectors is RetinaFace, and models based on ArcFace loss show high recognition results.
  • Deep image processing (Super-resolution, Style applying, denoising, image generation) - a class of GAN models is successfully used for various tasks of image processing and generation.
  • Human pose estimation and recognition (Skeleton key points detection) - DarkPose produces the best results using different basic models.
  • Object re-identification (multi-camera tracking of objects such as people and cars) - combines tracking and recognition approaches.
  • Action detection (gestures recognition, event detection) - R(2+1)D and Inflated 3D ConvNet (I3D) are examples of models for event detection.
  • Document scanning: form recognition, text recognition - Tesseract is considered as a classic tool for text recognition, CharNet and Differentiable Binarization based models are also successful.
  • 3D Reconstruction - multiple view geometry model estimation and structure from motion (SfM) allows you to build 3D models from images with millimeter precision.

To simplify any solution, predefined markers such as ArUco, QR, or barcodes can be used. With these, you can perform detection, recognition, tracking, spatial localization, and scene calibration easier.

What you need to know if you want to enhance your business with Computer Vision and AI

  • It is worth thinking about expanding the standard recommendation systems using visual requirements systems (the images do not always have the correct description, the system will generate it automatically).
  • Accuracy and speed of production lines can be improved with the help of computer vision systems that supplement workplaces so that employees do not need to interact with software control systems. Collected statistics allow us to determine the effectiveness of the consumption of materials.
  • The video surveillance system in production allows the automatic identification of idle workers, optimizing the movement of goods, thus increasing efficiency.
  • Automatic performance analysis, smart training, allows you to reduce the cost of medical services and effectively distribute the budget.
  • CV in AR applications in online and offline sales has a stronger visual effect on customers and increases the traffic of retail outlets due to quick fitting of clothes. It also increases security when selling jewelry.
  • CV in precision farming improves the efficiency of land maintenance and reduces the cost of fertilizer and fuel.
  • Automatic recognition of document forms and handwriting improves the document management system in medical and public institutions. This in turn allows you to use digitized documents for data mining.

Development and Implementation Challenges

Our engineers define a range of technical problems you may face in the implementation of Computer Vision. Our knowledge and experience will help you avoid such problems. Here are the most common issues seen during implementation:

It is often non trivial to form specific technical problems when solving a business problem since they can be achieved by different approaches. The choice of one of them is determined by the limitations of the scene and the presence of the dataset.

When developing a solution from scratch, you must first analyze the scene. In doing so you will determine the best placement for camera installation- one that provides the best possible view with minimal overlap. Additionally, cameras should be installed close enough to the monitored objects to provide high detail and, in turn, the accuracy of the Computer Vision algorithms.

Next, define the tasks and ways to resolve them. There is a choice to use multitasking end-to-end models or to develop a multistage algorithm based on single-tasking models. End-to-end solutions minimize the number of configurable parameters of the whole system, but also require preparing a large dataset with partitioning for all tasks. Multistage algorithm requires tuning at each stage, but in the end requires a much smaller amount of data for training.

Machine learning requires the preparation of a meaningful data set. Data collection in such tasks requires a lot of time and can include searching for available open source datasets for similar tasks, manual cleaning of datasets from outliers, or even marking and preparing datasets from scratch.

The introduction of Computer Vision into the acquisition process allows you to speed up the development of the dataset. Thus, adding Multiple Object Tracking functions to the data preparation software allows the operator to significantly speed up the process of data partitioning on video. After marking the first frame, the software predicts a new position of the object for subsequent frames and the operator only needs to correct or confirm the correctness of the marking.

The method of iterative training with pseudo-partitioning also allows you to tremendously reduce the time of model development. According to this methodology, a small set of data is marked out and the first model is trained. Then, with the help of this model, the data is further partitioned, where the main emphasis is placed on those examples where the model is in error. After the data set is expanded, the model is re-trained and the target metrics are checked. If the model falls within the specified error ranges, then the process is completed, otherwise, the re-training is repeated.

Computer vision specialists must evaluate the trained models - whether high accuracy is needed for segmentation or a rough estimate is sufficient, and choose a loss function that shows what the model should focus on when training.

Deep models need many computational resources, so we need to find a balance between accuracy and cost. Learning deep networks may require the use of a GPGPU cluster as well as an inference.

Scalability and bandwidth limitations for a large number of cameras in the vision system may be a problem. In this case, deploying intermediate computing nodes/servers is a good decision.

The mobile deployment includes model optimization for real-time inference, which requires the development of optimized algorithms for a wide range of platforms: about 30 different SoC models now cover 50% of the market and an individual optimization approach is needed for each model. That requires a lot of work from low-level optimization engineers.

If you are thinking of a project in Manufacturing, such as tracking within a factory, you will likely face issues with setting device sizes and shapes.  In such tasks, an integral part of the system is the calibration of the automatic parameters for a specific production site.

All software related to computer vision tasks needs to be highly productive for providing real-time video processing. Therefore, the last step before deployment is to compile the model into a more optimized format using libraries such as TensorRT, TVM, TensorFlow Lite, CoreML, and OpenVINO.

Software implementations for video processing algorithms need to be optimal for your tasks - therefore universal algorithms cannot be used.

It is also important to note that when developing Computer Vision systems for tasks taken on by actual workers, such as Production line monitoring, special care and attention must be taken. To improve the predictability of human behavior it may be necessary to refine the protocols of the production process so as to minimize possible overlaps and interference from workers.

Read more about Computer Vision algorithms and approaches we are using

Case Study