A Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition
Richard P. Wildes
This work provides an analytic approach to spatiotemporal ConvNet realization. It introduces a novel hierarchical spatiotemporal orientation representation for spacetime image analysis dubbed SOE-Net. It is designed to combine the benefits of the multilayer architecture of ConvNets and a more controlled approach to spacetime analysis. A distinguishing aspect of the approach is that unlike most contemporary convolutional networks no learning is involved; rather, all design decisions are specified analytically with theoretical motivations. This approach makes it possible to understand what information is being extracted at each stage and layer of processing as well as to minimize heuristic choices in design. Another key aspect of the network is its recurrent nature, whereby the output of each layer of processing feeds back to the input. To keep the network size manageable across layers, a novel cross-channel feature pooling is proposed. The multilayer architecture that results systematically reveals hierarchical image structure in terms of multiscale, multiorientation properties of visual spacetime. To illustrate its utility, the network has been applied to the task of dynamic texture recognition. Empirical evaluation on multiple standard datasets shows that it sets a new state-of-the-art.
SOE-Net at Work
This video provides synthetic and real life examples illustrating the emergence of abstract features thanks to the proposed recurrent processing.
SOE-Net has been evaluated on the task of dynamic texture recognition according to standard protocols on the two most recent dynamic texture datasets, YUVL (YUVL1, YUVL2, YUVL3)  and Dyntex (Alpha, Beta, Gamma, Dyntex_35, Dyntex++) .
Benefits of multiple layers and scales
Comparison to a learning based spatiotemporal ConvNet
Comparison to Dynamic Texture recognition state-of-the-art
For more results and discussion please refer to the full paper.
 K. Derpanis and R. Wildes, "Spacetime texture representation and recognition based on spatiotemporal orientation analysis," PAMI, vol. 34, pp. 1193-1205, 2012.
 R. Peteri, F. Sandor, and M. Huiskes, "DynTex: A comprehensive database of dynamic textures," Pattern Recognition Letters, vol. 31, pp. 1627-1632, 2010.
 D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning spatiotemporal features with 3D convolutional networks," in ICCV, 2015.
 B. Ghanem and A. Narendra, "Max margin distance learning for dynamic texture," in ECCV, 2010.
 A. Mumtaz, E. Coviello, G. Lanckriet, and A. B. Chan, "Clustering dynamic textures with the hierarchical EM algorithm for modeling video," PAMI, vol. 35, pp. 1606-1621, 2013.
 M. Harandi, C. Sanderson, C. Shen, and B. Lovell, "Dictionary learning and sparse coding on Grassmann manifolds: An extrinsic solution," in ICCV, 2013.
 Y. Quan, Y. Huang, and H. Ji, "Dynamic texture recognition via orthogonal tensor dictionary learning," in ICCV, 2015.
 Y. Xu, Y. Quan, H. Ling, and H. Ji, "Dynamic texture classification from fractal analysis," in ICCV, 2011.
 Y. Xu, S. Huang, H. Ji, and C. Fermuller, "Scale-space texture description on sift-like textons," CVIU, vol. 116, pp. 999-1013, 2012.
 H. Ji, X. Yang, H. Ling, and Y. Xu, "Wavelet domain multifractal analysis for static and dynamic texture classification," TIP, vol. 22, pp. 286-299, 2013.
 G. Zhao and M. Pietikainen, "Dynamic texture recognition using local binary patterns with an application to facial expressions,: PAMI, vol. 29, pp. 915-928, 2007.
 S. Dubois, R. Peteri, and M. Michel, "Characterization and recognition of dynamic textures based on the 2D+T curvelet," Sig. Im. & Vid. Proc., vol. 9, pp. 819-830, 2013. Last updated: December 20, 2017