However, in real surveillance scenarios, the background is often cluttered,
and the surveillance system has to detect the human actions of interest from a
crowd.
In
the past few years, computer vision researchers have witnessed a surge
of interest in human action analysis through videos. The objective of a human
action recognition system is to assign a label to a action - we call
it action recognition. With the technological advancement in recent times, the
development of intelligent systems has gained vital importance for our daily
life to make it comfortable, safe and secured. Human
action recognition is one of the features of the modern days’ intelligent
systems. However, the task of action recognition is not only limited to
labeling a specific action, but also its context may be further extended to
behavior understanding and scene interpretation. Such kind of action analysis
has a wide range of applications, such as, security monitoring,
surveillance, Intelligent Transport Systems, mobile robots, sports analysis and
training of athletes, virtual/augmented reality, animation, and so on.
Besides, other possible application areas are also being investigated by the
researchers.
Detecting
objects like pedestrians in unconstrained images or videos, tracking them over
time, and recognizing their actions are challenging tasks due to high intra
class variations in shape, appearance, scale, viewpoint, and pose, but also
due to occlusions, illumination changes, and background clutter.
Researchers
have built several public action data sets (e.g., KTH, Weizmann), which provide
good test beds for algorithm evaluation. Although these data sets have become
very popular, there exists a considerable gap between these staged samples and
real world scenarios. The majority of the action data sets are collected in
well-controlled environments, while the real world actions often happen in much
more complex scenes.
In
most current human action data sets, the human actions are generally recorded
with clean backgrounds, and each video clip generally involves only one type of
action (e.g., running or jogging) and only one person, who keeps doing
this action within the whole video clip. However, in real surveillance
scenarios, the background is often cluttered, and the surveillance system has
to detect the human actions of interest from a crowd. Fig. 1 shows such an
example of action detection in complex scene. IThe action detection in complex
scenes is much more difficult than in simple laboratory environments.
Figure 1. Illustration of the action detection
problem in complex scenes.
One
of the most challenging tasks in intelligent monitoring technology is the
analysis of human action in crowed scenes and detecting the people who have a
different action from others. The detection of an abnormal action can trigger
video transmission and recording, and can be used to attract the attention of a
human observer to a particular video channel.
In
complex scenes, e.g., with cluttered backgrounds or partially occluded
crowds, it is very difficult to locate human body precisely. When trying to
crop an object from a complex scene, we often have to endure substantial
misalignment or occasional drifting if no human interaction is involved. In
addition, ambiguities may also exist in temporal domain. A large portion of
real world actions happen only once and the duration is short. Since the human
action is continuous and the speed vary greatly even within the same action
category, it is not easy to decide the start or end point of these actions of
interest, even the duration of each action in real world scenarios. The
ambiguities in temporal domain are not recognized in repetitive actions, such
as running and jogging, but they may greatly affect the detection performance
when handling non-repetitive actions such as picking up an item, taking a
photo, and pushing an elevator button. Such spatial and temporal ambiguities
bring serious difficulty into the action detection task.
Reference :
[1] Yuxiao Hu, Liangliang Cao,
Fengjun L, Shuicheng Yan, Yihong Gong, and Thomas S. Huang, Action
Detection in Complex Scenes with Spatial and Temporal Ambiguities.
in ICCV 2009.
[2] S. M. Ashik Eftakhar, Joo Kooi
Tan, Hyoungseop Kim, and Seiji Ishikawa, Direction- oriented Human Motion
Recognition with Prior Estimation of Directions.in IEEE 2011.
[3] Mison Park, Joo Kooi Tan, Yuuki
Nakashima, Hyoungseop Kim, Seiji Ishikawa. Detecting Human Flows on a
Road Different from Main Flows. in AROB 2011.
[4] Juergen Gall, Angela Yao, Nima
Razavi, Luc Van Gool, Victor Lempitsky. Hough Forests for Object
Detection, Tracking, and Action Recognition. IEEE TPAMI, VOL. 33, NO.
11, November 2011.
No comments:
Post a Comment