Dynamic Scene Understanding: The Role of Orientation Features in Space and Time in Scene Classification

Contributors

  • Konstantinos G. Derpanis
  • Christoph Feichtenhofer (TU Graz)
  • Matthieu Lecce (University of Pennsylvania)
  • Axel Pinz (TU Graz)
  • Kostas Daniilidis (University of Pennsylvania)
  • Richard P. Wildes

Overview

YUPENN Dynamic Scenes
                                      data set overview
Sample frames of all scenes from the YUPENN Dynamic Scenes data set.

Natural scene classification is a fundamental challenge in the goal of automated scene understanding.  By far, the majority of studies have limited their scope to scenes from single image stills and thereby ignore potentially informative temporal cues.  This work is concerned with determining the degree of performance gain in considering short videos for recognizing natural scenes. Towards this end, the impact of multi-scale orientation measurements on scene classification is systematically investigated, as related to: 

  • spatial appearance
  • temporal dynamics
  • joint spatial appearance and dynamics

In addition, a new data set (YUPENN Dynamic Scenes) is introduced that contains 420 image sequences spanning fourteen scene categories, with temporal scene information due to objects and surfaces decoupled from camera-induced ones. This data set is used to evaluate classification performance of the various orientation-related representations, as well as state-of-the-art alternatives.

Approach

approach overview
Approach overview.  The input image sequence is spatially subdivided. Outer scale is determined by the spatiotemporal support of the individual subdivided regions. Relative position of subdivided regions captures scene layout. Next, a distributed feature set is extracted within each subdivision and concatenated to form the global feature descriptor, e.g., distribution of spacetime orientation measurements indicating the relative presence of the chosen orientation set.  Finally, the feature descriptor is classified using a nearest neighbor classifier (NN).

There are two key parts to the analysis of dynamic scene recognition considered in this work (see Approach overview above):

1. a representation based on the global layout of multi-scale local feature measurements that are aggregated across spacetime image subregions
  • Features evaluated:
    • spatial appearance: color and GIST
    • temporal dynamics: histogram of flow (HOF), chaos and appearance marginalized spatiotemporal oriented energies (MSOE)
    • joint spatial appearance and dynamics: chaos fused with GIST, HOF fused with GIST and spatiotemporal oriented energies (SOE); in addition, colour fused with each of the spatiotemporal features
2. a match measure between any two samples under consideration for classification


Results

Evaluation was conducted on two data sets: (i) Maryland "In-the-Wild and (ii) the new YUPENN Dynamic Scenes.  Since the Maryland data set contains large camera motions and scene cuts, it is difficult to understand whether classification performance of approaches depends on their success in capturing underlying scene structure vs. characteristics induced by the camera.  To shed light on this question, the YUPENN dataset is introduced which contains a wide variety of dynamic scenes captured from stationary cameras.  It is found that spacetime oriented energies designed to capture both spatial appearance and temporal dynamics are the best performer on the stabilized data set.  Overall, the spacetime oriented energy feature consistently characterizes dynamic scenes whether operating in the presence of strictly scene dynamics (stabilized case) or when confronted with overlaid, non-trivial camera motions (in-the-wild case). The alternative approaches considered are less capable of such wide ranging performance.


Supplemental Material

  • YUPENN Dynamic Scenes data set is available at our dataset page.

Related Publications

Last updated: Juky 23, 2014
π