Home
Research
Publications
Datasets
Courses
People

COS598 Spring 2015: The Unreasonable Effectiveness of Big Visual Data

Overview:

The emergence of large image and video datasets on the Internet, parallel computers and GPUs, and algorithms such as deep learning, have enabled significant breakthroughs in computer vision in the past decade. This class will discuss these advance topics in computer vision where the use of Big Visual Data is somehow changing the nature of the problem. We will focus on leveraging Big Visual Data to bring about new ways of looking at the vision problem. The emphasis is on fundamental concepts (instead of theory or application) of computer vision and artificial intelligence. This class requires solid background on computer vision and machine learning. Prerequisite is COS429 or equivalence.

Schedule:

DateTopicPresenterSlideReading
DLRGCaffe TutorialZhirong WuSlides
DLRGGoogLeNetFisher YuSlides
  •   [GoogLeNet] Going Deeper with Convolutions.
  • DLRGRecurrent Neural NetworkZhirong WuSlides
  •   [jaeger2002tutorial] Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the" echo state network" approach.
  •   [LSTM] Long short-term memory.
  •   [mnih2014recurrent] Recurrent models of visual attention.
  •   [ShowAndTell] Show and tell: A neural image caption generator.
  • Feb 2 Mon Credit Assignment in NN Prof. David Balduzzi
  •   [balduzzi2014kickback] Kickback cuts Backprop's red-tape: Biologically plausible credit assignment in neural networks.
  • Feb 4 WedAdversary NetworkLinguang Zhang Slides
  •   [Intriguing] Intriguing properties of neural networks.
  •   [Fooled] Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images.
  •   [GenerativeAdversarial] Generative adversarial nets.
  •   [AdversarialExamples] Explaining and Harnessing Adversarial Examples.
  • Feb 9 MonNeural Turning MachineZhirong Wu Slides
  •   [NeuralTurningMachine] Neural Turing Machines.
  •   [MemoryNetworks] Memory Networks.
  •   [zaremba2014learning] Learning to execute.
  • Feb 11 WedDeep Learning for NLPKiran N. VodrahalliSlides
  •   [WordEmbedding] Efficient Estimation of Word Representations in Vector Space.
  •   [RepWords] Distributed Representations of Words and Phrases and their Compositionality.
  •   [ParagraphEmbedding] Document Embedding with Paragraph Vectors.
  •   [ThoughtSpace] Sequence to sequence learning with neural networks.
  •   [RareWord] Addressing the Rare Word Problem in Neural Machine Translation.
  •   [NeuProb] A neural probabilistic language model.
  •   [RecurrentLanguage] Recurrent neural network based language model..
  •   [SemanticHashing] Semantic hashing.
  • Feb 16 Mon(cont.)Kiran N. Vodrahalli
    Feb 18 Wed(cont.)Kiran N. Vodrahalli
    Feb 23 MonImage CaptioningKiran N. Vodrahalli Slides https://pdollar.wordpress.com/2015/01/21/image-captioning/
    Feb 25 WedQuestion Answering MachineShuran SongSlides https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/object-recognition-and-scene-understanding/visual-turing-challenge/
    http://start.csail.mit.edu/start-system.html
    http://researcher.watson.ibm.com/researcher/view_group_pubs.php?grp=2099
    https://plus.google.com/+AndrejKarpathy/posts/6ywXT85yiUU
    http://sirius.clarity-lab.org/
    Mar 2 MonKnowledge Base and Common SenseYinda ZhangSlides http://start.csail.mit.edu/start-system.html
    http://www.wikidata.org
    http://dbpedia.org
    http://conceptnet5.media.mit.edu/
    http://www.freebase.com
    http://www.neil-kb.com
    http://robobrain.me
    Mar 4 Wed (Traveling)
    Mar 9 Mon Lifelong Visual MappingLinguang Zhang Slides
  •   [finman2013toward] Toward lifelong object segmentation from change detection in dense rgb-d maps.
  •   [Collet_Romea_2014_7677] HerbDisc: Towards Lifelong Robotic Object Discovery.
  •   [Collet_Romea_2012_7326] Lifelong Robotic Object Perception.
  •   [finman2014efficient] Efficient incremental map segmentation in dense RGB-D maps.
  •   [whelan3d] 3D mapping, localisation and object retrieval using low cost robotic platforms: A robotic search engine for the real-world.
  •   [finman2012real] Real-time large object category recognition using robust RGB-D segmentation features.
  •   [johannsson2013toward] Toward lifelong visual localization and mapping.
  •   [SLAMpp] Slam++: Simultaneous localisation and mapping at the level of objects.
  •   [fioraio2013towards] Towards Semantic KinectFusion.
  • Mar 11 WedProbabilistic ProgrammingZhirong Wu Slides
  •   [HowToGrowAMind] How to grow a mind: Statistics, structure, and abstraction.
  •   [Church] Church: a language for generative models.
  • Mar 16 Mon (No Class Spring Recess)
    Mar 18 Wed (No Class Spring Recess)
    Mar 23 MonProbabilistic Programming (Cont.)Zhirong Wu
    Mar 25 Wed (Visitor: Talk by Simon Korman)
    Mar 30 Mon (Cancelled)
    Apr 1 Wed 3D Shape Representation Tianqiang Liu Slides
  •   [arslan20143d] 3d Object Reconstruction from a Single Image..
  •   [rother2009seeing] Seeing 3D objects in a single 2D image.
  •   [rother2011hypothesize2] Hypothesize and bound: A computational focus of attention mechanism for simultaneous 3D shape reconstruction, pose estimation and classification from a single 2D image.
  •   [rother2011hypothesize] A hypothesize-and-bound algorithm for simultaneous object classification, pose estimation and 3D reconstruction from a single 2D image.
  •   [prisacariu2011shared] Shared shape spaces.
  •   [prisacariu2013simultaneous] Simultaneous monocular 2d segmentation, 3d pose recovery and 3d reconstruction.
  •   [Coconstraints2014] Co-Constrained Handles for Deformation in Shape Collections.
  •   [xiang_wacv14] Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild.
  •   [xiang2014monocular] Monocular multiview object tracking with 3d aspect parts.
  •   [kalogerakis2012probabilistic] A probabilistic model for component-based shape synthesis.
  • Apr 6 Mon 3D Shape Representation (Cont.) Tianqiang Liu
    Apr 8 Wed (Traveling)
    Apr 13 MonVideo Object RecognitionChenyi Chen Slides http://calvin.inf.ed.ac.uk/datasets/youtube-objects-dataset/
    Pawan Sinha TED talk (08:24)
    Apr 15 WedDeep Learning for Videos Sachin Ravi
  •   [VideoModeling] Video (language) modeling: a baseline for generative models of natural videos.
  •   [karpathy2014large] Large-scale video classification with convolutional neural networks.
  •   [DomainShift] Analysing domain shift factors between videos and images for object detection.
  •   [videoLSTM] Unsupervised Learning of Video Representations using LSTMs.
  •   [tran2014c3d] C3D: Generic Features for Video Analysis.
  •   [ng2015beyond] Beyond Short Snippets: Deep Networks for Video Classification.
  • Apr 20 Mon (ICCV Deadline)
    Apr 22 Wed (ICCV Deadline)
    Apr 27 MonDeep Learning for Speech RecognitionJeremy Cohen Microsoft MAVIS
    Apr 29 Wed Deep Learning for Object Detection Gabriel Huang
  •   [zhou2014object] Object Detectors Emerge in Deep Scene CNNs.
  •   [girshick2014deformable] Deformable part models are convolutional neural networks.
  • TBD Attention and Low ResolutionPingmei Xu
  •   [shen2014learning] Learning to predict eye fixations for semantic contents using multi-layer sparse network.
  •   [DeepGaze] Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet.
  • TBD Attention and Low Resolution
  •   [alexe2012searching] Searching for objects driven by context.
  •   [mnih2014recurrent] Recurrent models of visual attention.
  •   [murali2012autonomous] Autonomous exploration using rapid perception of low-resolution image information.
  •   [torralba2009many] How many pixels make an image?.
  •   [torralba200880] 80 million tiny images: A large data set for nonparametric object and scene recognition.
  •   [butko2006learning] Learning about humans during the first 6 minutes of life.
  •   [DeepGaze] Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet.
  • TBD Simulation and Knowledge Representation for Robotics Shuran Song
  •   [pronobis2012large] Large-scale semantic mapping and reasoning with heterogeneous modalities.
  •   [aydemir2013active] Active visual object search in unknown environments using uncertain semantics.
  •   [aydemir2012can] What can we learn from 38,000 rooms? reasoning about unexplored space in indoor environments.
  •   [hanheide2011exploiting] Exploiting probabilistic knowledge under uncertain sensing for efficient robot behaviour.
  •   [ziebart2009planning] Planning-based prediction for pedestrians.
  •   [treuille2006continuum] Continuum crowds.
  • TBDComputer Vision as Inverse GraphicsYinda Zhang
  •   [InverseGraphics] Inverse Graphics with Probabilistic CAD Models.
  •   [Picture] Picture: A probabilistic programming language for scene perception.
  •   [ProbGraphics] Approximate Bayesian image interpretation using generative probabilistic graphics programs.
  •   [DeepGen] Deep Generative Vision as Approximate Bayesian Computation.
  •   [OpenDR] Opendr: An approximate differentiable renderer.
  •   [mccloskey1983intuitive] Intuitive physics.
  •   [battagliacomputational] Computational Models of Intuitive Physics.
  •   [tang2012deep] Deep Lambertian Networks.
  • TBD Deep Learning for Object Detection
  •   [szegedy2013deep] Deep neural networks for object detection.
  •   [szegedy2014scalable] Scalable, High-Quality Object Detection.
  •   [girshick2014deformable] Deformable part models are convolutional neural networks.
  •   [zhou2014object] Object Detectors Emerge in Deep Scene CNNs.
  •   [Oquab14] Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks.
  •   [hoffman2014lsda] LSDA: Large scale detection through adaptation.
  • TBD Lastest News on Neural Network for Classification
  •   [romero2014fitnets] FitNets: Hints for Thin Deep Nets.
  •   [he2014spatial] Spatial pyramid pooling in deep convolutional networks for visual recognition.
  •   [agrawal2014analyzing] Analyzing the performance of multilayer neural networks for object recognition.
  •   [jaderberg2014synthetic] Synthetic data and artificial neural networks for natural scene text recognition.
  •   [ba2014deep] Do Deep Nets Really Need to be Deep?.
  •   [lee2014deeply] Deeply-supervised nets.
  •   [chatfield2014return] Return of the devil in the details: Delving deep into convolutional nets.
  •   [wu2015deep] Deep Image: Scaling up Image Recognition.
  •   [ioffe2015batch] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
  •   [he2015delving] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.
  • Tentative Topics:

    Class Requirement:

    Other Papers:

  •   [VisionEasier] Vision is getting easier every day.
  •   [Visipedia] Vision of a Visipedia.
  • © 2025 Princeton Vision & Robotics Labs ‒ Department of Computer Science @ Princeton University ‒ 35 Olden Street, Princeton, NJ 08540.