Pages

Monday, February 6, 2012

Human action recognition - Background


However, in real surveillance scenarios, the background is often cluttered, and the surveillance system has to detect the human actions of interest from a crowd.

In the past few years, computer vision researchers have witnessed a surge of interest in human action analysis through videos. The objective of a human action recognition system is to assign a label to a action - we call it action recognition. With the technological advancement in recent times, the development of intelligent systems has gained vital importance for our daily life to make it comfortable, safe and secured. Human action recognition is one of the features of the modern days’ intelligent systems. However, the task of action recognition is not only limited to labeling a specific action, but also its context may be further extended to behavior understanding and scene interpretation. Such kind of action analysis has a wide range of applications, such as, security monitoring, surveillance, Intelligent Transport Systems, mobile robots, sports analysis and training of athletes, virtual/augmented reality, animation, and so on. Besides, other possible application areas are also being investigated by the researchers.

Detecting objects like pedestrians in unconstrained images or videos, tracking them over time, and recognizing their actions are challenging tasks due to high intra class variations in shape, appearance, scale, viewpoint, and pose, but also due to occlusions, illumination changes, and background clutter.

Researchers have built several public action data sets (e.g., KTH, Weizmann), which provide good test beds for algorithm evaluation. Although these data sets have become very popular, there exists a considerable gap between these staged samples and real world scenarios. The majority of the action data sets are collected in well-controlled environments, while the real world actions often happen in much more complex scenes.

In most current human action data sets, the human actions are generally recorded with clean backgrounds, and each video clip generally involves only one type of action (e.g., running or jogging) and only one person, who keeps doing this action within the whole video clip. However, in real surveillance scenarios, the background is often cluttered, and the surveillance system has to detect the human actions of interest from a crowd. Fig. 1 shows such an example of action detection in complex scene. IThe action detection in complex scenes is much more difficult than in simple laboratory environments.

 Figure 1. Illustration of the action detection problem in complex scenes.

One of the most challenging tasks in intelligent monitoring technology is the analysis of human action in crowed scenes and detecting the people who have a different action from others. The detection of an abnormal action can trigger video transmission and recording, and can be used to attract the attention of a human observer to a particular video channel.

In complex scenes, e.g., with cluttered backgrounds or partially occluded crowds, it is very difficult to locate human body precisely. When trying to crop an object from a complex scene, we often have to endure substantial misalignment or occasional drifting if no human interaction is involved. In addition, ambiguities may also exist in temporal domain. A large portion of real world actions happen only once and the duration is short. Since the human action is continuous and the speed vary greatly even within the same action category, it is not easy to decide the start or end point of these actions of interest, even the duration of each action in real world scenarios. The ambiguities in temporal domain are not recognized in repetitive actions, such as running and jogging, but they may greatly affect the detection performance when handling non-repetitive actions such as picking up an item, taking a photo, and pushing an elevator button. Such spatial and temporal ambiguities bring serious difficulty into the action detection task.


Reference :

[1] Yuxiao Hu, Liangliang Cao, Fengjun L, Shuicheng Yan, Yihong Gong, and Thomas S. Huang, Action Detection in Complex Scenes with Spatial and Temporal Ambiguities. in ICCV 2009.
[2] S. M. Ashik Eftakhar, Joo Kooi Tan, Hyoungseop Kim, and Seiji Ishikawa, Direction- oriented Human Motion Recognition with Prior Estimation of Directions.in IEEE 2011.
[3] Mison Park, Joo Kooi Tan, Yuuki Nakashima, Hyoungseop Kim, Seiji Ishikawa. Detecting Human Flows on a Road Different from Main Flows. in AROB 2011.
[4] Juergen Gall, Angela Yao, Nima Razavi, Luc Van Gool, Victor Lempitsky. Hough Forests for Object Detection, Tracking, and Action Recognition. IEEE TPAMI, VOL. 33, NO. 11, November 2011.

No comments:

Post a Comment