COS598 Spring 2015: The Unreasonable Effectiveness of Big Visual Data


The emergence of large image and video datasets on the Internet, parallel computers and GPUs, and algorithms such as deep learning, have enabled significant breakthroughs in computer vision in the past decade. This class will discuss these advance topics in computer vision where the use of Big Visual Data is somehow changing the nature of the problem. We will focus on leveraging Big Visual Data to bring about new ways of looking at the vision problem. The emphasis is on fundamental concepts (instead of theory or application) of computer vision and artificial intelligence. This class requires solid background on computer vision and machine learning. Prerequisite is COS429 or equivalence.


Caffe TutorialZhirong WuSlides
GoogLeNetFisher YuSlides
Recurrent Neural NetworkZhirong WuSlides
Credit Assignment in NN Prof. David Balduzzi
Adversary NetworkLinguang Zhang Slides
Neural Turning MachineZhirong Wu Slides
Deep Learning for NLPKiran N. VodrahalliSlides
(cont.)Kiran N. Vodrahalli
(cont.)Kiran N. Vodrahalli
Image CaptioningKiran N. Vodrahalli Slides
Question Answering MachineShuran SongSlides
@article{ferrucci2010building, title={Building Watson: An overview of the DeepQA project}, author={Ferrucci, David and Brown, Eric and Chu-Carroll, Jennifer and Fan, James and Gondek, David and Kalyanpur, Aditya A and Lally, Adam and Murdock, J William and Nyberg, Eric and Prager, John and others}, journal={AI magazine}, volume={31}, number={3}, pages={59--79}, year={2010} } @inproceedings{Sirius, author = {Hauswald, Johann and Laurenzano, Michael A. and Zhang, Yunqi and Li, Cheng and Rovinski, Austin and Khurana, Arjun and Dreslinski, Ron and Mudge, Trevor and Petrucci, Vinicius and Tang, Lingjia and Mars, Jason}, title = {Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers}, booktitle = {Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)}, series = {ASPLOS '15}, year = {2015}, numpages = {13}, publisher = {ACM}, address = {New York, NY, USA}, note = {Acceptance Rate: 17% }, }
Knowledge Base and Common SenseYinda ZhangSlides
@incollection{zhu2014reasoning, title={Reasoning about object affordances in a knowledge base representation}, author={Zhu, Yuke and Fathi, Alireza and Fei-Fei, Li}, booktitle={Computer Vision--ECCV 2014}, pages={408--424}, year={2014}, publisher={Springer} }
Lifelong Visual MappingLinguang Zhang Slides
Probabilistic ProgrammingZhirong Wu Slides
Probabilistic Programming (Cont.)Zhirong Wu
3D Shape Representation Tianqiang Liu Slides
3D Shape Representation (Cont.) Tianqiang Liu
Video Object RecognitionChenyi Chen Slides
Pawan Sinha TED talk (08:24)
@article{kalogeiton2015analysing, title={Analysing domain shift factors between videos and images for object detection}, author={Kalogeiton, Vicky and Ferrari, Vittorio and Schmid, Cordelia}, journal={arXiv preprint arXiv:1501.01186}, year={2015} } @inproceedings{prest2012learning, title={Learning object class detectors from weakly annotated video}, author={Prest, Alessandro and Leistner, Christian and Civera, Javier and Schmid, Cordelia and Ferrari, Vittorio}, booktitle={Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on}, pages={3282--3289}, year={2012}, organization={IEEE} } @inproceedings{papazoglou2013fast, title={Fast object segmentation in unconstrained video}, author={Papazoglou, Anestis and Ferrari, Vittorio}, booktitle={Computer Vision (ICCV), 2013 IEEE International Conference on}, pages={1777--1784}, year={2013}, organization={IEEE} } @article{ostrovsky2009visual, title={Visual parsing after recovery from blindness}, author={Ostrovsky, Yuri and Meyers, Ethan and Ganesh, Suma and Mathur, Umang and Sinha, Pawan}, journal={Psychological Science}, volume={20}, number={12}, pages={1484--1491}, year={2009}, publisher={SAGE Publications} } @book{valvo1971sight, title={Sight restoration after long term blindness: The problems and behavior patterns of visual rehabilitation}, author={Valvo, Alberto}, year={1971}, publisher={American Foundation for the Blind} } @article{Wu12Eulerian, author = {Hao-Yu Wu and Michael Rubinstein and Eugene Shih and John Guttag and Fr\'{e}do Durand and William T. Freeman}, title = {Eulerian Video Magnification for Revealing Subtle Changes in the World}, journal = {ACM Trans. Graph. (Proceedings SIGGRAPH 2012)}, year = {2012}, volume = {31}, number = {4}, } @inproceedings{butko2006learning, title={Learning about humans during the first 6 minutes of life}, author={Butko, Nicholas J and Fasel, I and Movellan, Javier R}, booktitle={International Conference on Development and Learning, Indiana}, year={2006} }
Deep Learning for Videos Sachin Ravi
Deep Learning for Speech RecognitionJeremy Cohen Microsoft MAVIS
@article{hinton2012deep, title={Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups}, author={Hinton, Geoffrey and Deng, Li and Yu, Dong and Dahl, George E and Mohamed, Abdel-rahman and Jaitly, Navdeep and Senior, Andrew and Vanhoucke, Vincent and Nguyen, Patrick and Sainath, Tara N and others}, journal={Signal Processing Magazine, IEEE}, volume={29}, number={6}, pages={82--97}, year={2012}, publisher={IEEE} } @article{dahl2012context, title={Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition}, author={Dahl, George E and Yu, Dong and Deng, Li and Acero, Alex}, journal={Audio, Speech, and Language Processing, IEEE Transactions on}, volume={20}, number={1}, pages={30--42}, year={2012}, publisher={IEEE} }
Deep Learning for Object Detection Gabriel Huang
Attention and Low ResolutionPingmei Xu
Attention and Low Resolution
Simulation and Knowledge Representation for Robotics Shuran Song
Computer Vision as Inverse GraphicsYinda Zhang
Deep Learning for Object Detection
Lastest News on Neural Network for Classification

Tentative Topics:

Class Requirement:

Other Papers:

