Detecting humans, part II – Building a real-world viable system

Human detection is a rising method in the field of computer vision, and it already has vast industrial and commercial applications. In the first part of this blog series, we covered some of its most common industrial use cases in safety, people flow and security surveillance. However, building a real-world, viable human detection system requires more than just human detection algorithms – they need to be compiled with other insights. In part II of this blog series, we’ll introduce some of the building blocks of production-grade human detection systems:

Identifying humans: object detection algorithms

Intuitively enough, all of the use cases presented are built on algorithms that can identify humans from given images. Here, it is often sensible to use transfer learning methods to fine-tune existing backbone models which have been already trained with vast image data sets to infer shapes from edges and corners in the pixel level. Simply put, transfer learning refers to reusing capabilities from one task in another – in this case, not training a model from scratch to detect humans, but using existing algorithms where the basic capability to identify humans has already been established. 

Consider personal protective equipment surveillance as an example: it’s often not wise to rely only on your own camera images to train a model to detect humans, since it’s likely that the data you have is much more limited than the data used to train open source models that have relied on hundreds of thousands or millions of images of people to establish their detection capability. To save months or even years, take an existing open source model and customise it for your use case – that way you can focus your efforts on identifying things that are meaningful to your business.

Understanding the flow: Object tracking

Once you’ve established the capability to detect humans in your images, you’ll probably want to track them in images – at least if you are using video image, which is often the case with human detection. In practice, this means that you can track the same person when they move about in the frame. This is a prerequisite for most video-based human detection systems, but also offers additional value to e.g. people flow analysis. Your system will be able to understand how people actually move in physical spaces – are your pathways optimally designed, are there any notable bottlenecks, and so forth.

Building cache memory: unique object identification

Now that you have a system that is both capable of identifying humans and keeping track of them when they move, you may want your system to remember them even if they exit the frame. Consider a CCTV video feed of a room where you want to count how many people visit the room during a given day. However, people may exit the room for a while and then return – or there may be blind spots for the camera in the room. If your system is not able to identify and remember unique people, it will keep counting new people in the room whenever someone exits for a minute and returns, or when someone walks behind a pillar and disappears from the image for a second. Using unique object identification with e.g. face or full body identification, your system can keep track of individual people even if they exit the frame. There are multiple methods to achieve this, such as the DeepSORT algorithm.

Recognizing gear: other object detection algorithms

In addition to detecting humans, your business case may require the identification of other objects as well. For example, for PPE surveillance you may wish to detect helmets, high-visibility clothing, protective shoes – and yes, face masks. Thus, you’ll need to develop additional algorithms to detect these objects. Technically, these are often very much like the base human detection algorithms, just trained to detect other objects than humans. As in human detection, transfer learning, open source datasets and backbone models can be very fruitful and cost-effective to use here.

On top of these features, there are many other algorithms depending on your business case. However, these types of features are relatively well established and can already produce versatile and viable human detection systems which can be implemented in a matter of just weeks. 

We’ve now scratched the surface of human detection through some practical use cases and common features. In the final section of this article series, our Machine Learning Engineer Joni Karras will dive deeper into the technologies and their recent advancements in the field.

Kalle Kyyrö


+358 50 304 7409