A multimodal perception system for detection of human operators in robotic work cells

The video shows a possible solution to robustly detect human operators inside a collaborative scenario. The solution can be adopted not only for safety purposes but also to avoid unnecessary robot stops or slowdowns in case of false positives. A novel multimodal perception system has been made for human tracking, based on the fusion of depth and thermal images. A machine learning approach is pursued to achieve reliable detection performance in multi-robot collaborative systems.

The three videos demonstrate that the approach based on the fused images is more efficient than those adopting single source perception data, i.e., only thermal (T-CNN) or only depth data (D-CNN), in a real collaborative scenario. The high percentage of false positives of single-source approaches is found in cases where hot objects (T-CNN) or objects with shapes comparable to human ones (D-CNN) can be confused with a human. It is clear that the DT-CNN ensures a better human detection minimizing the false positives and this result allows to correctly compute the separation distance between the human operator and the robot.