Home
Research
Publications
Datasets
Courses
People

COS598C Spring 2014: Scene Understanding

Overview:

This class is to lay the foundation for research in the area of scene understanding of computer vision, by focusing on important topics from practical point of views. This class will review popular approaches and discuss about the fundamental principles underlying scene understanding in computer vision. We will be reading a mixture of papers from computer vision and influential works from cognitive psychology. We will also emphasis implementation techniques to leverage computation power, crowd sourcing and big data for computer vision research in general.

Schedule:

DateTopicPresenterSlide + CodeReading
Feb 3 MonIntroduction + Camera ModelJianxiong Xiao pptx pdf panorama
  •   [HZ] Multiple view geometry in computer vision.
  •   [SingleViewMetrology] Single view metrology.
  •   [ObjectPerspective] Putting objects in perspective.
  •   [LabelMe3D] Building a database of 3d scenes from user annotations.
  • Feb 5 WedClass Canceled (Severe Weather)
    Feb 10 MonLinear Algebra Review + Two View GeometryFisher Yu key

    pdf

    [SFMedu code]

    [Direct code]

    [Consistency code]
  •   [HZ] Multiple view geometry in computer vision.
  •   [PhotoTourism] Photo tourism: exploring photo collections in 3D.
  •   [QuasiDense] A quasi-dense approach to surface reconstruction from uncalibrated images.
  •   [ceres-solver] Ceres Solver.
  • Feb 12 WedStructure From Motion + Stereo MatchingFisher Yu
  •   [PMVS] Accurate, dense, and robust multiview stereopsis.
  • Feb 17 WedFactorization for SFM + Non-rigid SFM + Direct Method for RGBD Fisher Yu
  •   [Nonrigid3D] Recovering non-rigid 3D shape from image streams.
  •   [NonrigidSFM] Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors.
  •   [DirectMethod] Robust odometry estimation for rgb-d cameras.
  •   [DirectMethodICCV] Real-Time Visual Odometry from Dense RGB-D Images.
  • Feb 19 MonKinect FusionSema Berkiten pdf

    key

    [KinFu code]

    [SUN3Dsfm code]

    [SiftFu code]

  •   [KinectFusion] KinectFusion: Real-time dense surface mapping and tracking.
  •   [EfficientICP] Efficient variants of the ICP algorithm.
  •   [GeneralizedICP] Generalized-ICP.
  •   [LargeKinectFusion] Scalable real-time volumetric surface reconstruction.
  •   [Kintinuous] Kintinuous: Spatially extended kinectfusion.
  •   [KintinuousLoop] Deformation-based loop closure for large scale dense rgb-d slam.
  •   [KintinuousRobust] Robust real-time visual odometry for dense RGB-D mapping.
  •   [NonRigid] Robust Single-View Geometry And Motion Reconstruction.
  •   [SelfPortraits] 3D Self-Portraits.
  •   [KeyFrameFusion] On unifying key-frame and voxel-based dense visual SLAM at large scales.
  •   [HDRslam] 3D High Dynamic Range dense visual SLAM and its application to real-time object re-lighting.
  •   [SuperResolutionSLAM] Super-Resolution 3D Tracking and Mapping.
  •   [Elastic] Elastic Fragments for Dense Scene Reconstruction.
  • Feb 24 MonConvolutional Neural NetworkZhirong Wu pdf

    [Jianxiong's note]

    [Matlab Demo]

    [Web Demo]

    [Alex Code]

    [Caffe Code]
  •   [CNNnote] Notes on convolutional neural networks.
  •   [ParallelCognition] The parallel distributed processing approach to semantic cognition.
  •   [Connectionist] Learning and connectionist representations.
  •   [DCNN] Imagenet classification with deep convolutional neural networks.
  •   [LecunNet] Backpropagation applied to handwritten zip code recognition.
  •   [BestCNN] Visualizing and Understanding Convolutional Neural Networks.
  •   [Caffe] Caffe: An Open Source Convolutional Architecture for Fast Feature Embedding.
  •   [DeepDetection] Deep Neural Networks for Object Detection.
  •   [BengioRepresentation] Representation learning: A review and new perspectives.
  • Feb 26 WedAutoencoderDavid Dohan pptx

    pdf

    [Autoencoder Code]

    [RBM code]

    [DBM code]
  •   [AutoEncoder] Reducing the dimensionality of data with neural networks.
  • Mar 3 MonRBM + DBM + DBNDavid Dohan
  •   [RBM] Restricted Boltzmann machines for collaborative filtering.
  •   [DBM] Deep boltzmann machines.
  •   [DBN] A fast learning algorithm for deep belief nets.
  • Mar 5 WedVision and Action: Reinforcement + Apprenticeship LearningChenyi Chen pdf

    pptx

    [demo]
  •   [DeepRL] Playing Atari with Deep Reinforcement Learning.
  •   [ApprenticeshipLearning] Apprenticeship learning via inverse reinforcement learning.
  • Mar 10 MonGPU ProgrammingMaciej Halber pdf

    key

    [example code]
    CUDA C Programming Guide
    GPU Programming in MATLAB
    GPUmat
    Mar 12 WedMRF + CRF + GC + LBPHuiwen Chang pdf

    pptx

    [BP Code]

    [GraphCut Code gco]

    [MRFsfm]
  •   [BP] Understanding belief propagation and its generalizations.
  •   [GraphCut] Fast approximate energy minimization via graph cuts.
  •   [DistanceTransform] Distance transforms of sampled functions.
  •   [LazySnapping] Lazy snapping.
  •   [ConnectedCRF] Efficient inference in fully connected crfs with gaussian edge potentials.
  •   [MRFsfm] Discrete-Continuous Optimization for Large-Scale Structure from Motion.
  •   [MRFsfmPAMI] SfM with MRFs: Discrete-Continuous Optimization for Large-Scale Reconstruction.
  •   [EfficientBP] Efficient belief propagation for early vision.
  •   [TextonBoost] Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation.
  •   [CRFobject] Conditional random fields for object recognition.
  • Mar 17 MonNo Class (Spring Recess)
    Mar 19 WedNo Class (Spring Recess)
    Mar 24 MonCloud ComputingJohn McSpedon pdf

    pptx

    demo code

    Mar 26 WedObject DetectionShuran Song pdf

    pptx

    [DPM code]

    [Vlfeat code]

    [Color SIFT code]
  •   [DevaSVM] Dual coordinate solvers for large-scale structural SVMs.
  •   [PictorialStructure] The representation and matching of pictorial structures.
  •   [DalalTriggs] Histograms of oriented gradients for human detection.
  •   [DPM] Object Detection with Discriminatively Trained Part Based Models.
  •   [ExemplarSVMs] Ensemble of exemplar-svms for object detection and beyond.
  •   [PartMixtures] Articulated pose estimation with flexible mixtures-of-parts.
  •   [Poselet] Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations.
  •   [ExemplarSVMsMatching] Data-driven Visual Similarity for Cross-domain Image Matching.
  •   [FindingThings] Finding things: Image parsing with regions and per-exemplar detectors.
  •   [SelectiveSearch] Segmentation as selective search for object recognition.
  •   [Regionlets] Regionlets for Generic Object Detection.
  •   [CF] Model recommendation for action recognition.
  •   [LDA] Discriminative decorrelation for clustering and classification.
  •   [Cuboid] Localizing 3D Cuboids in Single-view Images.
  • Mar 31 MonFeatures and DatasetsShuran Song
  •   [SIFT] Distinctive image features from scale-invariant keypoints.
  •   [ColorSIFT] Evaluating Color Descriptors for Object and Scene Recognition.
  •   [DalalTriggs] Histograms of oriented gradients for human detection.
  •   [DPM] Object Detection with Discriminatively Trained Part Based Models.
  •   [GIST] Modeling the shape of the scene: A holistic representation of the spatial envelope.
  •   [LBP] Multiresolution gray-scale and rotation invariant texture classification with local binary patterns.
  •   [PrinciplesOfCategorization] Principles of categorization.
  •   [Visipedia] Vision of a Visipedia.
  •   [SUNDB] SUN Database: Exploring a Large Collection of Scene Categories.
  •   [PASCAL] The pascal visual object classes (voc) challenge.
  •   [ImageNet] Imagenet: A large-scale hierarchical image database.
  •   [SUN3D] SUN3D: A Database of Big Spaces Reconstructed using SfM and Object Labels.
  • Apr 2 WedBOW + SPM + Sparse CodingXinyi Fan pdf

    key

  •   [SPM] Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories.
  •   [LLC] Locality-constrained linear coding for image classification.
  •   [LSPM] Linear spatial pyramid matching using sparse coding for image classification.
  •   [FisherVector] Image Classification with the Fisher Vector: Theory and Practice.
  •   [FisherKernel] Improving the fisher kernel for large-scale image classification.
  •   [CodingComparison] The devil is in the details: an evaluation of recent feature encoding methods.
  •   [SmallCodes] Small codes and large image databases for recognition.
  •   [MultidimensionalSpectralHashing] Multidimensional spectral hashing.
  •   [SpectralHashing] Spectral hashing.
  •   [CompactCodes] Aggregating local image descriptors into compact codes.
  • Apr 7 MonInstance-level MatchingPingmei Xu pdf

    key

  •   [VideoGoogle] Video Google: A text retrieval approach to object matching in videos.
  •   [GoogleGoggle] Object retrieval with large vocabularies and fast spatial matching.
  •   [Quantization] Lost in quantization: Improving particular object retrieval in large scale image databases.
  •   [TotalRecall] Total recall: Automatic query expansion with a generative feature model for object retrieval.
  •   [InstanceLevelRecognition] 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints.
  •   [GeometricEra] Object recognition in the geometric era: A retrospective.
  • Apr 9 WedWeb ProgrammingPingmei Xu pdf

    key
    w3schools.com
    Apr 14 MonWebGL + Blender (Basic + Command Line Tool) Maciej Halber WebGL pdf
    WebGL key
    WebGL code

    Blender key
    Blender pdf
    BlenderScript
    BlenderFiles
    Learning WebGL Lessons
    Apr 16 WedCrowd SourcingSimin Chen pdf

    pptx

    [Matlab Turk API]

    [DrawMe code]

    [TurkCleaner code]
  •   [HumanInTheLoop] Visual recognition with humans in the loop.
  •   [Rating] Online crowdsourcing: rating annotators and obtaining cost-effective labels.
  •   [InteractiveTraining] Strong supervision from weak annotation: Interactive training of deformable part models.
  •   [Turkit] Turkit: human computation algorithms on mechanical turk.
  •   [ParallelHuman] Exploring iterative and parallel human computation processes.
  •   [ProgrammingHuman] Programming with human computation.
  •   [CrowdPowered] Crowd-powered systems.
  •   [ImageNet] Imagenet: A large-scale hierarchical image database.
  • Apr 21 MonScene and ContextYinda Zhang pdf

    pptx

  •   [GeometricContext] Geometric context from a single image.
  •   [PhotoPop-up] Automatic photo pop-up.
  •   [RGBDcuboid] A Linear Approach to Matching Cuboids in RGBD Images.
  •   [ExactLayout] Efficient exact inference for 3d indoor scene understanding.
  •   [BoxInBox] Box In the Box: Joint 3D Layout and Object Reasoning from Single Images.
  •   [ObjectPerspective] Putting objects in perspective.
  •   [Make3D] Make3d: Learning 3d scene structure from a single still image.
  •   [HallucinateHuman] Hallucinated Humans as the Hidden Context for Labeling 3D Scenes.
  •   [RoomLayout] Recovering the spatial layout of cluttered rooms.
  •   [StochasticGrammar] A stochastic grammar of images.
  •   [DDMCMC] Image segmentation by data-driven Markov chain Monte Carlo.
  •   [ImageParsing] Image parsing: Unifying segmentation, detection, and recognition.
  •   [AutoContext] Auto-context and its application to high-level vision tasks.
  •   [GrammarParsing] Bottom-up/top-down image parsing with attribute grammar.
  •   [AndOrGraph] A numerical study of the bottom-up and top-down inference processes in and-or graphs.
  •   [SimulationScene] Simulation as an engine of physical scene understanding.
  •   [GrowMind] How to grow a mind: Statistics, structure, and abstraction.
  •   [ProbabilisticGraphics] Approximate Bayesian image interpretation using generative probabilistic graphics programs.
  • Apr 23 WedSemantic SegmentationBebe Shi pdf

    [TextonBoost Code]

    [TextonForest Code]

    [SiftFlow Code]

    [Label Transfer Code]

    [SuperParsing Code]
  •   [TextonBoost] Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation.
  •   [TextonForest] Semantic texton forests for image categorization and segmentation.
  •   [SiftFlow] SIFT flow: dense correspondence across different scenes.
  •   [LabelTransfer] Nonparametric scene parsing via label transfer.
  • Apr 28 MonCompressive SensingLi-Fang Cheng pdf

    pptx

    L1 magic
  •   [InformativeSensingArXiv] Informative sensing.
  •   [InformativeSensingICIP] Informative sensing of natural images.
  •   [InformativeSensing] Informative sensing: theory and applications.
  • Apr 30 Wed How to do research + Open Discussion Jianxiong Xiao pdf

    pptx
    Bill Freeman's how to do research
    Bill Freeman's crowd sourced note
    Ramesh Raskar's How to invent: The Idea Hexagon

    Tentative Topics:

    Class Requirement:

    Reading List:

    Resources:

    Books

    There is no textbook for this class. The following are just references if you are interested.

    Computer vision:

    Learning:

    Graphical models:

    Related Courses:

    Computer Vision Class at Princeton

    By Antonio Torralba at MIT:

    By Alyosha Efros at CMU/Berkeley:

    By James Hays at Brown:

    By others:

    Code and Datasets

    Songs

    © 2025 Princeton Vision & Robotics Labs ‒ Department of Computer Science @ Princeton University ‒ 35 Olden Street, Princeton, NJ 08540.