Gesture Recognition


  • Konstantinos G. Derpanis
  • Richard P. Wildes
  • John K. Tsotsos


A common approach to modeling hand gestures (e.g., American Sign Language) in the computer vision literature is to build explicit models for each gesture in the lexicon. A limitation of this approach is that the modeling of the gestures does not scale up to large languages. For example, a recent dictionary of American Sign Language (ASL) documented over 4500 sign. It turns out that like speech, ASL can be linguistically described in terms of a small number of basic parts, termed phonemes. The parts that comprise ASL gestures can be broadly categorized as: location (``Where on the body is the gesture made?''), handshape (``How are the hand(s) articulated?'') and movements (``How do the hand(s) move?''). Basing a recognition system on a phonemic decomposition provides a powerful paradigm, since the number of phonemes to be modeled are relatively small compared to the number of gestures at the lexical level. Our recent efforts have concentrated on modeling and recognizing the phonemic movements of ASL. Our most current approach, extracts kinematic features from the apparent motion as observed from a single camera and combines them to yield distinctive signatures for 14 single-handed rigid phonemic movements of ASL. The approach has been instantiated in software and evaluated on a database of 592 gesture sequences with an overall recognition rate of 97.13%.

Related Publications: