COS598 Spring 2015: The Unreasonable Effectiveness of Big Visual Data

Overview:

The emergence of large image and video datasets on the Internet, parallel computers and GPUs, and algorithms such as deep learning, have enabled significant breakthroughs in computer vision in the past decade. This class will discuss these advance topics in computer vision where the use of Big Visual Data is somehow changing the nature of the problem. We will focus on leveraging Big Visual Data to bring about new ways of looking at the vision problem. The emphasis is on fundamental concepts (instead of theory or application) of computer vision and artificial intelligence. This class requires solid background on computer vision and machine learning. Prerequisite is COS429 or equivalence.

Instructor: Jianxiong Xiao
Time: Monday,Wednesday, 3:00-4:20
Location: CS302

Schedule:

Date	Topic	Presenter	Slide
Caffe Tutorial	Zhirong Wu	Slides
GoogLeNet	Fisher Yu	Slides	@article{GoogLeNet, author = {Christian Szegedy and Wei Liu and Yangqing Jia and Pierre Sermanet and Scott Reed and Dragomir Anguelov and Dumitru Erhan and Vincent Vanhoucke and Andrew Rabinovich}, title = {Going Deeper with Convolutions}, journal = {CoRR}, volume = {abs/1409.4842}, year = {2014}, url = {http://arxiv.org/abs/1409.4842}, timestamp = {Wed, 01 Oct 2014 15:00:05 +0200}, biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/SzegedyLJSRAEVR14}, bibsource = {dblp computer science bibliography, http://dblp.org} }
Recurrent Neural Network	Zhirong Wu	Slides	@book{jaeger2002tutorial, title={Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the" echo state network" approach}, author={Jaeger, Herbert}, year={2002}, publisher={GMD-Forschungszentrum Informationstechnik} } @article{LSTM, title={Long short-term memory}, author={Hochreiter, Sepp and Schmidhuber, J{\"u}rgen}, journal={Neural computation}, volume={9}, number={8}, pages={1735--1780}, year={1997}, publisher={MIT Press} } @inproceedings{mnih2014recurrent, title={Recurrent models of visual attention}, author={Mnih, Volodymyr and Heess, Nicolas and Graves, Alex and others}, booktitle={Advances in Neural Information Processing Systems}, pages={2204--2212}, year={2014} } @article{ShowAndTell, title={Show and tell: A neural image caption generator}, author={Vinyals, Oriol and Toshev, Alexander and Bengio, Samy and Erhan, Dumitru}, journal={arXiv preprint arXiv:1411.4555}, year={2014} }
Credit Assignment in NN	Prof. David Balduzzi		@article{balduzzi2014kickback, title={Kickback cuts Backprop's red-tape: Biologically plausible credit assignment in neural networks}, author={Balduzzi, David and Vanchinathan, Hastagiri and Buhmann, Joachim}, journal={arXiv preprint arXiv:1411.6191}, year={2014} }
Adversary Network	Linguang Zhang	Slides	@article{Intriguing, title={Intriguing properties of neural networks}, author={Szegedy, Christian and Zaremba, Wojciech and Sutskever, Ilya and Bruna, Joan and Erhan, Dumitru and Goodfellow, Ian and Fergus, Rob}, journal={arXiv preprint arXiv:1312.6199}, year={2013} } @article{Fooled, title={Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images}, author={Nguyen, Anh and Yosinski, Jason and Clune, Jeff}, journal={arXiv preprint arXiv:1412.1897}, year={2014} } @inproceedings{GenerativeAdversarial, title={Generative adversarial nets}, author={Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua}, booktitle={Advances in Neural Information Processing Systems}, pages={2672--2680}, year={2014} } @article{AdversarialExamples, title={Explaining and Harnessing Adversarial Examples}, author={Goodfellow, Ian J and Shlens, Jonathon and Szegedy, Christian}, journal={arXiv preprint arXiv:1412.6572}, year={2014} }
Neural Turning Machine	Zhirong Wu	Slides	@article{NeuralTurningMachine, author = {Alex Graves and Greg Wayne and Ivo Danihelka}, title = {Neural Turing Machines}, journal = {CoRR}, volume = {abs/1410.5401}, year = {2014}, url = {http://arxiv.org/abs/1410.5401}, timestamp = {Sun, 02 Nov 2014 11:25:59 +0100}, biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/GravesWD14}, bibsource = {dblp computer science bibliography, http://dblp.org} } @article{MemoryNetworks, author = {Jason Weston and Sumit Chopra and Antoine Bordes}, title = {Memory Networks}, journal = {CoRR}, volume = {abs/1410.3916}, year = {2014}, url = {http://arxiv.org/abs/1410.3916}, timestamp = {Sun, 02 Nov 2014 11:25:59 +0100}, biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/WestonCB14}, bibsource = {dblp computer science bibliography, http://dblp.org} } @article{zaremba2014learning, title={Learning to execute}, author={Zaremba, Wojciech and Sutskever, Ilya}, journal={arXiv preprint arXiv:1410.4615}, year={2014} }
Deep Learning for NLP	Kiran N. Vodrahalli	Slides	@article{WordEmbedding, author = {Tomas Mikolov and Kai Chen and Greg Corrado and Jeffrey Dean}, title = {Efficient Estimation of Word Representations in Vector Space}, journal = {CoRR}, volume = {abs/1301.3781}, year = {2013}, url = {http://arxiv.org/abs/1301.3781}, timestamp = {Mon, 18 Feb 2013 20:50:59 +0100}, biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/abs-1301-3781}, bibsource = {dblp computer science bibliography, http://dblp.org} } @incollection{RepWords, title = {Distributed Representations of Words and Phrases and their Compositionality}, author = {Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Jeff}, booktitle = {Advances in Neural Information Processing Systems 26}, editor = {C.J.C. Burges and L. Bottou and M. Welling and Z. Ghahramani and K.Q. Weinberger}, pages = {3111--3119}, year = {2013}, } @article{ParagraphEmbedding, title={Document Embedding with Paragraph Vectors}, author={Dai, Andrew M and Olah, Christopher and Le, Quoc V and Corrado, Greg S} } @inproceedings{ThoughtSpace, title={Sequence to sequence learning with neural networks}, author={Sutskever, Ilya and Vinyals, Oriol and Le, Quoc VV}, booktitle={Advances in Neural Information Processing Systems}, pages={3104--3112}, year={2014} } @article{RareWord, title={Addressing the Rare Word Problem in Neural Machine Translation}, author={Luong, Thang and Sutskever, Ilya and Le, Quoc V and Vinyals, Oriol and Zaremba, Wojciech}, journal={arXiv preprint arXiv:1410.8206}, year={2014} } @article{NeuProb, title={A neural probabilistic language model}, author={Bengio, Yoshua and Ducharme, R{\'e}jean and Vincent, Pascal and Janvin, Christian}, journal={The Journal of Machine Learning Research}, volume={3}, pages={1137--1155}, year={2003}, publisher={JMLR. org} } @inproceedings{RecurrentLanguage, title={Recurrent neural network based language model.}, author={Mikolov, Tomas and Karafi{\'a}t, Martin and Burget, Lukas and Cernock{\`y}, Jan and Khudanpur, Sanjeev}, booktitle={INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010}, pages={1045--1048}, year={2010} } @article{SemanticHashing, title={Semantic hashing}, author={Salakhutdinov, Ruslan and Hinton, Geoffrey}, journal={International Journal of Approximate Reasoning}, volume={50}, number={7}, pages={969--978}, year={2009}, publisher={Elsevier} }
(cont.)	Kiran N. Vodrahalli
(cont.)	Kiran N. Vodrahalli
Image Captioning	Kiran N. Vodrahalli	Slides	https://pdollar.wordpress.com/2015/01/21/image-captioning/
Question Answering Machine	Shuran Song	Slides	https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/object-recognition-and-scene-understanding/visual-turing-challenge/ http://start.csail.mit.edu/start-system.html http://researcher.watson.ibm.com/researcher/view_group_pubs.php?grp=2099 https://plus.google.com/+AndrejKarpathy/posts/6ywXT85yiUU http://sirius.clarity-lab.org/ @article{ferrucci2010building, title={Building Watson: An overview of the DeepQA project}, author={Ferrucci, David and Brown, Eric and Chu-Carroll, Jennifer and Fan, James and Gondek, David and Kalyanpur, Aditya A and Lally, Adam and Murdock, J William and Nyberg, Eric and Prager, John and others}, journal={AI magazine}, volume={31}, number={3}, pages={59--79}, year={2010} } @inproceedings{Sirius, author = {Hauswald, Johann and Laurenzano, Michael A. and Zhang, Yunqi and Li, Cheng and Rovinski, Austin and Khurana, Arjun and Dreslinski, Ron and Mudge, Trevor and Petrucci, Vinicius and Tang, Lingjia and Mars, Jason}, title = {Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers}, booktitle = {Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)}, series = {ASPLOS '15}, year = {2015}, numpages = {13}, publisher = {ACM}, address = {New York, NY, USA}, note = {Acceptance Rate: 17% }, }
Knowledge Base and Common Sense	Yinda Zhang	Slides	http://start.csail.mit.edu/start-system.html http://www.wikidata.org http://dbpedia.org http://conceptnet5.media.mit.edu/ http://www.freebase.com http://www.neil-kb.com http://robobrain.me @incollection{zhu2014reasoning, title={Reasoning about object affordances in a knowledge base representation}, author={Zhu, Yuke and Fathi, Alireza and Fei-Fei, Li}, booktitle={Computer Vision--ECCV 2014}, pages={408--424}, year={2014}, publisher={Springer} }
Lifelong Visual Mapping	Linguang Zhang	Slides	@inproceedings{finman2013toward, title={Toward lifelong object segmentation from change detection in dense rgb-d maps}, author={Finman, Ross and Whelan, Thomas and Kaess, Michael and Leonard, John J}, booktitle={Mobile Robots (ECMR), 2013 European Conference on}, pages={178--185}, year={2013}, organization={IEEE} } @inproceedings{Collet_Romea_2014_7677, author = "Alvaro {Collet Romea} and Bo Xiong and Corina Gurau and Martial Hebert and Siddhartha Srinivasa", title = "HerbDisc: Towards Lifelong Robotic Object Discovery", booktitle = "International Journal of Robotics Research (IJRR)", year = "2014", } @phdthesis{Collet_Romea_2012_7326, author = "Alvaro {Collet Romea}", title = "Lifelong Robotic Object Perception", booktitle = "", school = "Robotics Institute, Carnegie Mellon University", month = "August", year = "2012", number= "CMU-RI-TR-12-22", address= "Pittsburgh, PA", } @inproceedings{finman2014efficient, title={Efficient incremental map segmentation in dense RGB-D maps}, author={Finman, Ross and Whelan, Thomas and Kaess, Michael and Leonard, John J}, booktitle={Robotics and Automation (ICRA), 2014 IEEE International Conference on}, pages={5488--5494}, year={2014}, organization={IEEE} } @article{whelan3d, title={3D mapping, localisation and object retrieval using low cost robotic platforms: A robotic search engine for the real-world}, author={Whelan, Thomas and Kaess, Michael and Finman, Ross and Fallon, Maurice and Johannsson, Hordur and Leonard, John J and McDonald, John} } @phdthesis{finman2012real, title={Real-time large object category recognition using robust RGB-D segmentation features}, author={Finman, Ross Edward}, year={2012}, school={Massachusetts Institute of Technology} } @techreport{johannsson2013toward, title={Toward lifelong visual localization and mapping}, author={Johannsson, Hordur}, year={2013}, institution={DTIC Document} } @inproceedings{SLAMpp, title={Slam++: Simultaneous localisation and mapping at the level of objects}, author={Salas-Moreno, Renato F and Newcombe, Richard A and Strasdat, Hauke and Kelly, Paul HJ and Davison, Andrew J}, booktitle={Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on}, pages={1352--1359}, year={2013}, organization={IEEE} } @incollection{fioraio2013towards, title={Towards Semantic KinectFusion}, author={Fioraio, Nicola and Cerri, Gregorio and Di Stefano, Luigi}, booktitle={Image Analysis and Processing--ICIAP 2013}, pages={299--308}, year={2013}, publisher={Springer} }
Probabilistic Programming	Zhirong Wu	Slides	@article{HowToGrowAMind, title={How to grow a mind: Statistics, structure, and abstraction}, author={Tenenbaum, Joshua B and Kemp, Charles and Griffiths, Thomas L and Goodman, Noah D}, journal={science}, volume={331}, number={6022}, pages={1279--1285}, year={2011}, publisher={American Association for the Advancement of Science} } @article{Church, title={Church: a language for generative models}, author={Goodman, Noah and Mansinghka, Vikash and Roy, Daniel and Bonawitz, Keith and Tarlow, Daniel}, journal={arXiv preprint arXiv:1206.3255}, year={2012} }
Probabilistic Programming (Cont.)	Zhirong Wu
3D Shape Representation	Tianqiang Liu	Slides	@article{arslan20143d, title={3d Object Reconstruction from a Single Image.}, author={Arslan, Ozan}, journal={International Journal of Environment and Geoinformatics}, volume={1}, number={1}, year={2014} } @inproceedings{rother2009seeing, title={Seeing 3D objects in a single 2D image}, author={Rother, Diego and Sapiro, Guillermo}, booktitle={Computer Vision, 2009 IEEE 12th International Conference on}, pages={1819--1826}, year={2009}, organization={IEEE} } @article{rother2011hypothesize2, title={Hypothesize and bound: A computational focus of attention mechanism for simultaneous 3D shape reconstruction, pose estimation and classification from a single 2D image}, author={Rother, Diego and Mahendran, Siddharth and Vidal, Rene}, journal={arXiv preprint arXiv:1109.5730}, year={2011} } @inproceedings{rother2011hypothesize, title={A hypothesize-and-bound algorithm for simultaneous object classification, pose estimation and 3D reconstruction from a single 2D image}, author={Rother, Diego and Vidal, Ren{\'e}}, booktitle={Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on}, pages={553--560}, year={2011}, organization={IEEE} } @inproceedings{prisacariu2011shared, title={Shared shape spaces}, author={Prisacariu, Victor Adrian and Reid, Ian}, booktitle={Computer Vision (ICCV), 2011 IEEE International Conference on}, pages={2587--2594}, year={2011}, organization={IEEE} } @incollection{prisacariu2013simultaneous, title={Simultaneous monocular 2d segmentation, 3d pose recovery and 3d reconstruction}, author={Prisacariu, Victor Adrian and Segal, Aleksandr V and Reid, Ian}, booktitle={Computer Vision--ACCV 2012}, pages={593--606}, year={2013}, publisher={Springer} } @article{Coconstraints2014, title = {Co-Constrained Handles for Deformation in Shape Collections}, author = {Yumer, M. E., and Kara, L. B.}, journal = {ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2014)}, volume = {33}, issue = {6}, pages = {187:1-187:11}, year = {2014}, } @InProceedings{xiang_wacv14, author = {Yu Xiang and Roozbeh Mottaghi and Silvio Savarese}, title = {Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild}, booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)}, year = {2014}, } @incollection{xiang2014monocular, title={Monocular multiview object tracking with 3d aspect parts}, author={Xiang, Yu and Song, Changkyu and Mottaghi, Roozbeh and Savarese, Silvio}, booktitle={Computer Vision--ECCV 2014}, pages={220--235}, year={2014}, publisher={Springer} } @article{kalogerakis2012probabilistic, title={A probabilistic model for component-based shape synthesis}, author={Kalogerakis, Evangelos and Chaudhuri, Siddhartha and Koller, Daphne and Koltun, Vladlen}, journal={ACM Transactions on Graphics (TOG)}, volume={31}, number={4}, pages={55}, year={2012}, publisher={ACM} }
3D Shape Representation (Cont.)	Tianqiang Liu
Video Object Recognition	Chenyi Chen	Slides	http://calvin.inf.ed.ac.uk/datasets/youtube-objects-dataset/ Pawan Sinha TED talk (08:24) @article{kalogeiton2015analysing, title={Analysing domain shift factors between videos and images for object detection}, author={Kalogeiton, Vicky and Ferrari, Vittorio and Schmid, Cordelia}, journal={arXiv preprint arXiv:1501.01186}, year={2015} } @inproceedings{prest2012learning, title={Learning object class detectors from weakly annotated video}, author={Prest, Alessandro and Leistner, Christian and Civera, Javier and Schmid, Cordelia and Ferrari, Vittorio}, booktitle={Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on}, pages={3282--3289}, year={2012}, organization={IEEE} } @inproceedings{papazoglou2013fast, title={Fast object segmentation in unconstrained video}, author={Papazoglou, Anestis and Ferrari, Vittorio}, booktitle={Computer Vision (ICCV), 2013 IEEE International Conference on}, pages={1777--1784}, year={2013}, organization={IEEE} } @article{ostrovsky2009visual, title={Visual parsing after recovery from blindness}, author={Ostrovsky, Yuri and Meyers, Ethan and Ganesh, Suma and Mathur, Umang and Sinha, Pawan}, journal={Psychological Science}, volume={20}, number={12}, pages={1484--1491}, year={2009}, publisher={SAGE Publications} } @book{valvo1971sight, title={Sight restoration after long term blindness: The problems and behavior patterns of visual rehabilitation}, author={Valvo, Alberto}, year={1971}, publisher={American Foundation for the Blind} } @article{Wu12Eulerian, author = {Hao-Yu Wu and Michael Rubinstein and Eugene Shih and John Guttag and Fr\'{e}do Durand and William T. Freeman}, title = {Eulerian Video Magnification for Revealing Subtle Changes in the World}, journal = {ACM Trans. Graph. (Proceedings SIGGRAPH 2012)}, year = {2012}, volume = {31}, number = {4}, } @inproceedings{butko2006learning, title={Learning about humans during the first 6 minutes of life}, author={Butko, Nicholas J and Fasel, I and Movellan, Javier R}, booktitle={International Conference on Development and Learning, Indiana}, year={2006} }
Deep Learning for Videos	Sachin Ravi		@article{VideoModeling, author = {Marc'Aurelio Ranzato and Arthur Szlam and Joan Bruna and Micha{\"{e}}l Mathieu and Ronan Collobert and Sumit Chopra}, title = {Video (language) modeling: a baseline for generative models of natural videos}, journal = {CoRR}, volume = {abs/1412.6604}, year = {2014}, url = {http://arxiv.org/abs/1412.6604}, timestamp = {Thu, 01 Jan 2015 19:51:08 +0100}, biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/RanzatoSBMCC14}, bibsource = {dblp computer science bibliography, http://dblp.org} } @inproceedings{karpathy2014large, title={Large-scale video classification with convolutional neural networks}, author={Karpathy, Andrej and Toderici, George and Shetty, Sanketh and Leung, Thomas and Sukthankar, Rahul and Fei-Fei, Li}, booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2014} } @article{DomainShift, title={Analysing domain shift factors between videos and images for object detection}, author={Kalogeiton, Vicky and Ferrari, Vittorio and Schmid, Cordelia}, journal={arXiv preprint arXiv:1501.01186}, year={2015} } @article{videoLSTM, author = {Nitish Srivastava and Elman Mansimov and Ruslan Salakhutdinov}, title = {Unsupervised Learning of Video Representations using LSTMs}, journal = {CoRR}, volume = {abs/1502.04681}, year = {2015}, url = {http://arxiv.org/abs/1502.04681}, timestamp = {Mon, 02 Mar 2015 14:17:34 +0100}, biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/SrivastavaMS15}, bibsource = {dblp computer science bibliography, http://dblp.org} } @article{tran2014c3d, title={C3D: Generic Features for Video Analysis}, author={Tran, Du and Bourdev, Lubomir and Fergus, Rob and Torresani, Lorenzo and Paluri, Manohar}, journal={arXiv preprint arXiv:1412.0767}, year={2014} } @article{ng2015beyond, title={Beyond Short Snippets: Deep Networks for Video Classification}, author={Ng, Joe Yue-Hei and Hausknecht, Matthew and Vijayanarasimhan, Sudheendra and Vinyals, Oriol and Monga, Rajat and Toderici, George}, journal={arXiv preprint arXiv:1503.08909}, year={2015} }
Deep Learning for Speech Recognition	Jeremy Cohen		Microsoft MAVIS @article{hinton2012deep, title={Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups}, author={Hinton, Geoffrey and Deng, Li and Yu, Dong and Dahl, George E and Mohamed, Abdel-rahman and Jaitly, Navdeep and Senior, Andrew and Vanhoucke, Vincent and Nguyen, Patrick and Sainath, Tara N and others}, journal={Signal Processing Magazine, IEEE}, volume={29}, number={6}, pages={82--97}, year={2012}, publisher={IEEE} } @article{dahl2012context, title={Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition}, author={Dahl, George E and Yu, Dong and Deng, Li and Acero, Alex}, journal={Audio, Speech, and Language Processing, IEEE Transactions on}, volume={20}, number={1}, pages={30--42}, year={2012}, publisher={IEEE} }
Deep Learning for Object Detection	Gabriel Huang		@article{zhou2014object, title={Object Detectors Emerge in Deep Scene CNNs}, author={Zhou, Bolei and Khosla, Aditya and Lapedriza, Agata and Oliva, Aude and Torralba, Antonio}, journal={arXiv preprint arXiv:1412.6856}, year={2014} } @article{girshick2014deformable, title={Deformable part models are convolutional neural networks}, author={Girshick, Ross and Iandola, Forrest and Darrell, Trevor and Malik, Jitendra}, journal={arXiv preprint arXiv:1409.5403}, year={2014} }
Attention and Low Resolution	Pingmei Xu		@article{shen2014learning, title={Learning to predict eye fixations for semantic contents using multi-layer sparse network}, author={Shen, Chengyao and Zhao, Qi}, journal={Neurocomputing}, volume={138}, pages={61--68}, year={2014}, publisher={Elsevier} } @article{DeepGaze, author = {Matthias Kummerer and Lucas Theis and Matthias Bethge}, title = {Deep Gaze {I:} Boosting Saliency Prediction with Feature Maps Trained on ImageNet}, journal = {CoRR}, }
Attention and Low Resolution			@inproceedings{alexe2012searching, title={Searching for objects driven by context}, author={Alexe, Bogdan and Heess, Nicolas and Teh, Yee W and Ferrari, Vittorio}, booktitle={Advances in Neural Information Processing Systems}, pages={881--889}, year={2012} } @inproceedings{mnih2014recurrent, title={Recurrent models of visual attention}, author={Mnih, Volodymyr and Heess, Nicolas and Graves, Alex and others}, booktitle={Advances in Neural Information Processing Systems}, pages={2204--2212}, year={2014} } @article{murali2012autonomous, title={Autonomous exploration using rapid perception of low-resolution image information}, author={Murali, Vidya N and Birchfield, Stanley T}, journal={Autonomous Robots}, volume={32}, number={2}, pages={115--128}, year={2012}, publisher={Springer} } @article{torralba2009many, title={How many pixels make an image?}, author={Torralba, Antonio}, journal={Visual neuroscience}, volume={26}, number={01}, pages={123--131}, year={2009}, publisher={Cambridge Univ Press} } @article{torralba200880, title={80 million tiny images: A large data set for nonparametric object and scene recognition}, author={Torralba, Antonio and Fergus, Robert and Freeman, William T}, journal={Pattern Analysis and Machine Intelligence, IEEE Transactions on}, volume={30}, number={11}, pages={1958--1970}, year={2008}, publisher={IEEE} } @inproceedings{butko2006learning, title={Learning about humans during the first 6 minutes of life}, author={Butko, Nicholas J and Fasel, I and Movellan, Javier R}, booktitle={International Conference on Development and Learning, Indiana}, year={2006} } @article{DeepGaze, author = {Matthias Kummerer and Lucas Theis and Matthias Bethge}, title = {Deep Gaze {I:} Boosting Saliency Prediction with Feature Maps Trained on ImageNet}, journal = {CoRR}, }
Simulation and Knowledge Representation for Robotics	Shuran Song		@inproceedings{pronobis2012large, title={Large-scale semantic mapping and reasoning with heterogeneous modalities}, author={Pronobis, Andrzej and Jensfelt, Patric}, booktitle={Robotics and Automation (ICRA), 2012 IEEE International Conference on}, pages={3515--3522}, year={2012}, organization={IEEE} } @article{aydemir2013active, title={Active visual object search in unknown environments using uncertain semantics}, author={Aydemir, Alper and Pronobis, Andrzej and Gobelbecker, Moritz and Jensfelt, Patric}, journal={Robotics, IEEE Transactions on}, volume={29}, number={4}, pages={986--1002}, year={2013}, publisher={IEEE} } @inproceedings{aydemir2012can, title={What can we learn from 38,000 rooms? reasoning about unexplored space in indoor environments}, author={Aydemir, Alper and Jensfelt, Patric and Folkesson, John}, booktitle={Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on}, pages={4675--4682}, year={2012}, organization={IEEE} } @inproceedings{hanheide2011exploiting, title={Exploiting probabilistic knowledge under uncertain sensing for efficient robot behaviour}, author={Hanheide, Marc and Gretton, Charles and Dearden, Richard and Hawes, Nick and Wyatt, Jeremy and Pronobis, Andrzej and Aydemir, Alper and G{\"o}belbecker, Moritz and Zender, Hendrik}, booktitle={IJCAI Proceedings-International Joint Conference on Artificial Intelligence}, volume={22}, number={3}, pages={2442}, year={2011} } @inproceedings{ziebart2009planning, title={Planning-based prediction for pedestrians}, author={Ziebart, Brian D and Ratliff, Nathan and Gallagher, Garratt and Mertz, Christoph and Peterson, Kevin and Bagnell, James A and Hebert, Martial and Dey, Anind K and Srinivasa, Siddhartha}, booktitle={Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on}, pages={3931--3936}, year={2009}, organization={IEEE} } @article{treuille2006continuum, title={Continuum crowds}, author={Treuille, Adrien and Cooper, Seth and Popovi{\'c}, Zoran}, journal={ACM Transactions on Graphics (TOG)}, volume={25}, number={3}, pages={1160--1168}, year={2006}, publisher={ACM} }
Computer Vision as Inverse Graphics	Yinda Zhang		@article{InverseGraphics, author = {Tejas D. Kulkarni and Vikash K. Mansinghka and Pushmeet Kohli and Joshua B. Tenenbaum}, title = {Inverse Graphics with Probabilistic {CAD} Models}, journal = {CoRR}, volume = {abs/1407.1339}, year = {2014}, url = {http://arxiv.org/abs/1407.1339}, timestamp = {Fri, 01 Aug 2014 13:50:01 +0200}, biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/KulkarniMKT14}, bibsource = {dblp computer science bibliography, http://dblp.org} } @article{Picture, title={Picture: A probabilistic programming language for scene perception}, author={Kulkarni, Tejas D and Kohli, Pushmeet and Cambridge, MSR and Tenenbaum, Joshua B and Mansinghka, Vikash} } @inproceedings{ProbGraphics, title={Approximate Bayesian image interpretation using generative probabilistic graphics programs}, author={Mansinghka, Vikash and Kulkarni, Tejas D and Perov, Yura N and Tenenbaum, Josh}, booktitle={Advances in Neural Information Processing Systems}, pages={1520--1528}, year={2013} } @article{DeepGen, title={Deep Generative Vision as Approximate Bayesian Computation}, author={Kulkarni, Tejas D and Yildirim, Ilker and Kohli, Pushmeet and Freiwald, Winrich A and Tenenbaum, Joshua B} } @incollection{OpenDR, title={Opendr: An approximate differentiable renderer}, author={Loper, Matthew M and Black, Michael J}, booktitle={Computer Vision--ECCV 2014}, pages={154--169}, year={2014}, publisher={Springer} } @article{mccloskey1983intuitive, title={Intuitive physics}, author={McCloskey, Michael}, journal={Scientific american}, volume={248}, number={4}, pages={122--130}, year={1983} } @article{battagliacomputational, title={Computational Models of Intuitive Physics}, author={Battaglia, Peter and Ullman, Tomer and Tenenbaum, Joshua and Sanborn, Adam and Forbus, Kenneth and Gerstenberg, Tobias and Lagnado, David} } @article{tang2012deep, title={Deep Lambertian Networks}, author={Tang, Yichuan and Salakhutdinov, Ruslan and Hinton, Geoffrey}, journal={arXiv preprint arXiv:1206.6445}, year={2012} }
Deep Learning for Object Detection			@inproceedings{szegedy2013deep, title={Deep neural networks for object detection}, author={Szegedy, Christian and Toshev, Alexander and Erhan, Dumitru}, booktitle={Advances in Neural Information Processing Systems}, pages={2553--2561}, year={2013} } @article{szegedy2014scalable, title={Scalable, High-Quality Object Detection}, author={Szegedy, Christian and Reed, Scott and Erhan, Dumitru and Anguelov, Dragomir}, journal={arXiv preprint arXiv:1412.1441}, year={2014} } @article{girshick2014deformable, title={Deformable part models are convolutional neural networks}, author={Girshick, Ross and Iandola, Forrest and Darrell, Trevor and Malik, Jitendra}, journal={arXiv preprint arXiv:1409.5403}, year={2014} } @article{zhou2014object, title={Object Detectors Emerge in Deep Scene CNNs}, author={Zhou, Bolei and Khosla, Aditya and Lapedriza, Agata and Oliva, Aude and Torralba, Antonio}, journal={arXiv preprint arXiv:1412.6856}, year={2014} } @inproceedings{Oquab14, author = "Oquab, M. and Bottou, L. and Laptev, I. and Sivic, J.", title = "Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks", booktitle = "CVPR", year = "2014", } @inproceedings{hoffman2014lsda, title={LSDA: Large scale detection through adaptation}, author={Hoffman, Judy and Guadarrama, Sergio and Tzeng, Eric S and Hu, Ronghang and Donahue, Jeff and Girshick, Ross and Darrell, Trevor and Saenko, Kate}, booktitle={Advances in Neural Information Processing Systems}, pages={3536--3544}, year={2014} }
Lastest News on Neural Network for Classification			@article{romero2014fitnets, title={FitNets: Hints for Thin Deep Nets}, author={Romero, Adriana and Ballas, Nicolas and Kahou, Samira Ebrahimi and Chassang, Antoine and Gatta, Carlo and Bengio, Yoshua}, journal={arXiv preprint arXiv:1412.6550}, year={2014} } @article{he2014spatial, title={Spatial pyramid pooling in deep convolutional networks for visual recognition}, author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian}, journal={arXiv preprint arXiv:1406.4729}, year={2014} } @incollection{agrawal2014analyzing, title={Analyzing the performance of multilayer neural networks for object recognition}, author={Agrawal, Pulkit and Girshick, Ross and Malik, Jitendra}, booktitle={Computer Vision--ECCV 2014}, pages={329--344}, year={2014}, publisher={Springer} } @article{jaderberg2014synthetic, title={Synthetic data and artificial neural networks for natural scene text recognition}, author={Jaderberg, Max and Simonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew}, journal={arXiv preprint arXiv:1406.2227}, year={2014} } @inproceedings{ba2014deep, title={Do Deep Nets Really Need to be Deep?}, author={Ba, Jimmy and Caruana, Rich}, booktitle={Advances in Neural Information Processing Systems}, pages={2654--2662}, year={2014} } @article{lee2014deeply, title={Deeply-supervised nets}, author={Lee, Chen-Yu and Xie, Saining and Gallagher, Patrick and Zhang, Zhengyou and Tu, Zhuowen}, journal={arXiv preprint arXiv:1409.5185}, year={2014} } @article{chatfield2014return, title={Return of the devil in the details: Delving deep into convolutional nets}, author={Chatfield, Ken and Simonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew}, journal={arXiv preprint arXiv:1405.3531}, year={2014} } @article{wu2015deep, title={Deep Image: Scaling up Image Recognition}, author={Wu, Ren and Yan, Shengen and Shan, Yi and Dang, Qingqing and Sun, Gang}, journal={arXiv preprint arXiv:1501.02876}, year={2015} } @article{ioffe2015batch, title={Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift}, author={Ioffe, Sergey and Szegedy, Christian}, journal={arXiv preprint arXiv:1502.03167}, year={2015} } @article{he2015delving, title={Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification}, author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian}, journal={arXiv preprint arXiv:1502.01852}, year={2015} }

Tentative Topics:

Big Dataset Collection
Knowledge Base (e.g. Freebase, http://www.neil-kb.com )
Common Sense Representation
Crowd Sourcing (e.g. http://robobrain.me/)
Question Answering Machine for Image
Deep Convolutional Neural Nets
Image Caption Generation
Deep Learning for Natural Language Processing (Machine Translation)
Autonomous Driving Vehicle
Deep Learning for Videos
Neural Turning Machine
3D Deep Learning (e.g. ZNN)
Computer Vision as Inverse Graphics
Lifelong Visual Mapping

Class Requirement:

Each student will sign up for the topic that they know the best, and take turns to give an in-depth tutorial to the class.
There is no exam for the class. The grade directly depends on the quality of your presentation.
Your presentation should assume zero prior knowledge about the subject, and should be as clear and understandable as possible.
Your presentation should start by explaining the main idea, the main equations, and then go into great details while not losing the audience.
Your presentation should be very technical. You should read the papers to present many times, read the source codes for the papers if available.
You should know everything about the subject to present and be prepared to answer any questions from the class.
You should try to integrate several papers of the topic into a coherent presentation. Instead of dividing your presentation as having several disconected parts, one for each paper, try to give a coherent tutorial.
The papers listed on the schedule is not the complete set of papers for the topic. You should do a literature review and search it on Google+Scholar to include important relevant papers as well. If you find one outside my list, email the title to me to include in the list.
Send the slides (PDF + PPT or Keynote source files) to me after your presentation to put it online.

Other Papers:

@article{VisionEasier, title={Vision is getting easier every day}, author={Cavanagh, Patrick}, journal={Perception}, volume={24}, number={11}, pages={1227}, url = {http://lpp.psycho.univ-paris5.fr/pdf/PapersPC/1995/Cavanagh-24-1995-1227-32.pdf}, year={1995} } @article{Visipedia, title={Vision of a Visipedia}, author={Perona, Pietro}, journal={Proceedings of the IEEE}, volume={98}, number={8}, pages={1526--1534}, year={2010}, publisher={IEEE} }