Recognizing Scene Viewpoint using Panoramic Place Representation
Abstract
The pose of an object carries crucial semantic meaning
for object manipulation and usage (e.g., grabbing a mug,
watching a television). Just as pose estimation is part of object recognition, viewpoint recognition is a necessary and
unavoidable component of scene recognition. For instance,
as shown in Figure 1, a theater has a clear distinct distributions of objects – a stage on one side and seats on the other
– that defines unique views in different orientations. Just
as observers will choose a view of a television that allows
them to see the screen, observers in a theater will sit facing
the stage when watching a show.
The goal of this paper is to study the viewpoint recognition problem in scenes. We aim to design a model which,
given a photo, can classify the place category to which it
belongs (e.g. a theater), and predict the direction in which
the observer is facing within that place (e.g. towards the
stage). Our model learns the typical arrangement of visual
features in a 360-degree
panoramic representation of a place, and
learns to map individual views of a place to that representation. Now, given an input photo, we will be able to place
that photo within a larger panoramic image. This allows us
to extrapolate the layout beyond the available view, as if we
were to rotate the camera all around the observer.
Example_Results_on_Panorama.pdf:
This file contains more examples of result visualization on our panorama dataset. It is an extension of Figure 8 in the paper.
Example_Results_on_SUN.pdf:
This file contains more examples of result visualization on the SUN dataset. It is an extension of Figure 8 in the paper.
Algorithm Analysis
Algorithm_Analysis.pdf:
This file contains further analysis of the algorithm and its relation with similar algorithms.
Geometry of Panorama
panorama.pdf:
This file contains some explanation for the geometry of panorama image.
Performance
Performance_Table.pdf:
This file contains a extended version of Table 1 and 2 in the paper to show the performance by category.
More materials
Border_Extension.pdf:
This file contains some examples of boundary extension to extrapolate image based on texture synthesis.
MTurk_View_Matching_GUI.png:
This file shows the Amazon Mechanical Turk GUI to let workers to label the viewpoint for the pictures from SUN dataset.
To generate an normal field of view images from a panorma, download the code "pano2photo.zip" in the source code download section.
To download the images and other data we actually used in the experiments for scene viewpoint recognition, download the file "cvpr2012pano_codeRelease_v1.zip" in the source code download section.
Note: Object annotation on this dataset is in progress.
CVPR2012 code: This folder contains all source code and data used in the experiments. It contains all precomputed results as well as source code to recompute everything from scratch.
If you just want to do the viewpoint recognition experiment and compare with our paper, you only need to download this file (no need to download the above links for SUN360 database).
pano2photo: This is a small piece of code to demonstrate how to warp between panorama and normal images. It has been included in the above file.
polarPlot: This is a small piece of code to plot a curve or a histogram in polar coordinate. It has been included in the above file.
OnlineStructuralSVM: a Matlab implementation of the cutting plane algorithm for training a Structural SVM.
Acknowledgments
We thank Tomasz Malisiewicz, Andrew Owens, Aditya
Khosla, Dahua Lin and reviewers for helpful discussions.
This work is funded by NSF grant (1016862) to A.O,
Google research awards to A.O and A.T., ONR MURI
N000141010933 and NSF Career Award No. 0747120 to
A.T., and a NSF Graduate Research fellowship to K.A.E.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation and other funding agencies.
All materials in this website, including images, data, and visualization, can be used for academic research purpose ONLY.