COS429 Fall 2014: Computer Vision

Computer Vision, art by kirkh.deviantart.com

Overview:

On your one-minute walk from the coffee machine to your desk each morning, you pass by dozens of scenes – a kitchen, an elevator, your office – and you effortlessly recognize them and perceive their 3D structure. But this one-minute scene-understanding problem has been an open challenge in computer vision, since the field was first established 50 years ago. In this class, we will learn the state-of-the-art algorithms, and study how to build computer systems that automatically understand visual scenes, both inferring the semantics and extracting 3D structure.

This course requires programming experience as well as basic linear algebra. Previous knowledge of visual computing will be helpful.

Instructor: Jianxiong Xiao
TAs:
- Brian Matejek (bmatejek [at] princeton )
- Pingmei Xu (pingmeix [at] princeton )
- Shuran Song (shurans [at] princeton )
Time: Tuesday,Thursday, 3:00PM - 4:20PM
Location for Lecture: Friend Center 006
Office Hour: Friday 1-2PM (Location CS003)
Online Discussion: http://piazza.com
Registrar Link: Class Number 20663

Assignments:

Schedule:

Many of the following slides are modified from the excellent class notes of similar courses offered by Prof. Fredo Durand, Alexei Efros, Rob Fergus, William Freeman, Thomas Funkhouser, James Hays, Kristen Grauman, Svetlana Lazebnik, Fei-fei Li, Srinivasa Narasimhan, Aude Oliva, Szymon Rusinkiewicz, Silvio Savarese, Steve Seitz, Noah Snavely, Richard Szeliski, Antonio Torralba, Yair Weiss, and Li Zhang. We are extremely grateful to them.

The following schedule is preliminary and subject to change as the term evolves.

W	L	Date	Topic	Slide	Reading	Deadline
1	1	Thu Sep 11	How to save the world?	Introduction	Sz1
2	2	Tue Sep 16	What is a camera? Accidental pinhole!	Image Formulation	Sz2 Acc
	3	Thu Sep 18	What color is the sky?	Color	Sz2
		Fri Sep 19	Precept 1 [Friend Center 006 12:00-1:00]	Precept1
3	4	Tue Sep 23	Let's test your beer goggles	Filtering	Sz3
3	5	Thu Sep 25	Natural image statistics	Statistics	Sz10
4	6	Tue Sep 30	Let's make the world simpler.	Edges and Lines		PS1 due
4	7	Thu Oct 2	Where have all the flowers gone? (Andras Ferencz @ Mobileye)	Tracking	Sz8
5	8	Tue Oct 7	Superman vision: revealing invisible changes in the world	Motion	Sz8 Wig Mag Mic
	9	Thu Oct 9	Life story of a 3D point	Geometry	Sz2 Sz7
		Fri Oct 10	Precept 2 [CS105 12:20-1:30]	Precept2
6	10	Tue Oct 14	Life story of a 3D point (continued)
6	11	Thu Oct 16	The fundamental matrix song	Multi-view Geometry	Sz2 Sz7 Sz4 Sz6 Sz9	PS2 due
7	12	Tue Oct 21	The fundamental matrix song (continue)	Multi-view Geometry	Sz2 Sz7 Sz4 Sz6 Sz9
		Wed Oct 22	Precept 3 [COS105 12:30-13:30]	Precept3
	13	Thu Oct 23	Building rome in a day	Structure From Motion	Sz11 Sz12
8		Tue Oct 28	No class (Fall Recess)
8		Thu Oct 30	No class (Fall Recess)
9	14	Tue Nov 4	Where is "the birth of venus"?	Image-based Modeling	Sz13	PS3 due
9	15	Thu Nov 6	What is a chair?	Object Detection	Sz14
10	16	Tue Nov 11	Guest lecture (Kevin Zhou @ Siemens)	Medical Image Parsing
10	17	Thu Nov 13	Guest lecture (Kevin Zhou @ Siemens)	Medical Image Parsing
11		Mon Nov 17	Precept 4 [CS105 12:20-1:30]	Precept4
	18	Tue Nov 18	Using the forest to see the trees	Scene Understanding	Sz14
	19	Thu Nov 20	Using the forest to see the trees (continued)	Scene Understanding	Sz14
12	20	Tue Nov 25	The unreasonable effectiveness of big visual data	Data-driven Brute-force		PS4 due
12		Thu Nov 27	No class (Thanksgiving Recess)
13	21	Tue Dec 2	RGB-D object recognition (Shuran Song)	RGB-D Recognition	Sliding Shapes
		Wed Dec 3	Precept 5 [CS105 12:20-1:30]	Precept5
	22	Thu Dec 4	Final project proposal presentation			Proposal due
14	23	Tue Dec 9	Shape Matching (Prof. Tom Funkhouser)	Shape Matching
14	24	Thu Dec 11	This is how the brain really works + Quiz	Deep Learning		PS5 due
		Tue Jan 13	Final report due

Matlab Workshop:

We will use Matlab as our programming language in this class, which is very simple and you should be able to pick it up in 1-2 hours. Princeton OIT provides Matlab for installation (Please install the latest version R2014a with all available toolboxes.) If you have no experience in Matlab, please attend the following Matlab training workshop by Keller Center (Registration Required).

Date: September 15-17, 2014
Time: 6:30 - 9 p.m.
Location: Friend Center Auditorium (101)
Registration: Required
Information: https://commons.princeton.edu/kellercenter/2014/08/matlab-and-r-eworkshop-to-be-held-september-15-17.html

Requirements/Grading:

Grading will be based on our assessment of your understanding of the class material, and will be roughly comprised of:

Design Project - 30%
Programming Assignments - 60%
Class/Precept Participation (In-class Quizzes) - 10%

Final Project:

The final project will allow you to explore in depth a topic covered in class which you found interesting and like to know more about. Projects can be performed in teams of 1-3 students. During the semester we will propose ideas for projects in the problem sets and lectures, and we also encourage the students to come up with their own ideas that entice them. The topic for the final project and its scope should be approved by the class staff. Overall, the final project is comprised of (a) a project proposal, (b) a class presentation for your proposal, and (c) a report (and source code) documenting your work, results and conclusions. You are encouraged to be as creative as possible.

Final Project Report:

We expect the report to be between 2-8 pages. If you have more members in your team, we expect your report to be longer. Your report should be written using the CVPR latex template and compiled into a PDF file. (downloaded from http://www.pamitc.org/cvpr15/files/cvpr2015AuthorKit.zip).

Here is a guideline for the report writing:
1. Define the problem: What is the input? What is the output?
2. Motivation: Why do this? What makes it difficult?
3. Related work: What has been done? What are the problems?
4. Algorithm: What did you try? What are the alternatives? Why you choose to try this? Justify your design decision.
5. Result: Visualization of your result: screen capture, plot, example outputs. Add figures into your report.
6. Evaluation: Does it work? How well it works?
7. Analysis: Why it works or doesn't work? Can we make it better? How?
8. [For 2 or 3 person teams] Contribution division: Who did what?

Communication:

All the course information, announcements and material will be available on this class website. Announcements will also be sent to the class email list COS429_F2014 [the at sign] princeton.edu (make sure you are registered). piazza.com is used as a discussion forum for the class. We encourage students to post questions and remarks about the material and assignments in that forum.

Late Policy:

Assignments are due at 11:59PM on the due date, as determined by the file date of the file upload. Late assignments are marked down 1/4 of the full grade per day. One minute late is the same as one day late. An up to one-week late submission is allowed once, no questions asked, and you can use it at your discretion (don't use it needlessly). Any additional unapproved late submission will be considered as unsubmitted work. Late submission is not allowed for the final project, class presentation and quiz.

Extra Credit Policy:

Most assignments will include an opportunity for "extra credit." Please note that we will not explicitly award extra points to the assignment score for this extra credit -- i.e.,, it is not possible to receive more than 100% on any assignment. However, we will consider extra credit when assigning grades to students near a grade boundary (e.g., it might push a borderline B+ up to an A-, or from A to A+).

Academic Integrity:

All students pledge to adhere to the Honor Code in the conduct of all assignments, quizzes and final projects that take place in class. We took this very seriously. Your submission will be checked both manually by the TAs and automatically by the state-of-the-art computer software. We will use an automatic program to run your code and compare your code with other students' (including both this year and all previous years) and public available implementations (e.g. from Google, Bing, Github), to verify your result and detect plagiarism to make sure there is no cheating. The computer software is very powerful and we will have zero tolerance policy on plagiarism. For more information, please refer to the university policy here.

Collaboration Policy

The COS 429 assignment collaboration policy is derived from that of Princeton's COS 217 ...

Concerning receiving help from others...

Programming is an individual creative process much like composition. You must reach your own understanding of the problem and discover a path to its solution. During this time, discussions with other people are permitted and encouraged. However, when the time comes to write code that solves the problem, such discussions (except with course staff members) are no longer appropriate: the code must be your own work. If you have a question about how to use some feature of C, Unix, etc., you certainly can ask your friends or the teaching assistants, but specific questions about code you have written must be treated more carefully.

For each assignment you must specifically state, in your writeup file, the names of any individuals from whom you received help, and the nature of the help that you received. That includes help from friends, classmates, lab TAs, course staff members, etc.

Do not, under any circumstances, copy another person's code. Incorporating someone else's code into your code in any form is a violation of academic regulations. This includes adapting solutions or partial solutions to assignments from any offering of this course or any other course. There is one exception to the code-sharing rule: You may adapt code from the COS 429 course materials provided that you explain what code you use, and cite its source in your writeup file.

Copying and transforming someone else's code (by rearranging independent code, renaming variables, rewording comments, etc.) is plagiarism. Some inexperienced programmers have the misconception that detecting such plagiarism is difficult. Actually, detecting such plagiarism is quite easy. Not only does such plagiarism quickly identify itself during the grading process, but also we can (and do) use software packages, such as Alex Aiken's renowned MOSS software, for automated help.

If we suspect a student of plagiarism on an assignment, then we will refer the case to the Committee on Discipline. If the Committee on Discipline finds the student guilty of plagiarism, then the standard penalty is automatic failure of the COS 429 course. The Committee on Discipline may impose additional penalties.

Concerning providing help to others...

For each assignment you must specifically state, in your writeup file, the names of any individuals to whom you provided help, and the nature of the help that you provided.

Abetting plagiarism or unauthorized collaboration by "sharing" your code is prohibited. Sharing code in digital form is an especially egregious violation. Do not e-mail your code or make your code available to anyone. Do not share your code with anyone even after the due date/time of the assignment.

You are responsible for keeping your solutions to the COS 429 programming assignments away from prying eyes. If someone else copies your code, we have no way to determine who is the owner and who is the copier; the Committee on Discipline decides. If you are working on a public cluster computer, make sure that you do not leave the computer unattended, and that you delete your local files and logout before leaving.

You should store all of your assignment files in a private directory. You can create a private directory using commands similar to these:

$ mkdir cos429
$ chmod 700 cos429

Concerning electronic communication...

If you have a question or comment that will be helpful to other students, and you need not reveal any parts of your work to express the question or comment properly, then you should post it to the course's Piazza page. One of the course's instructors will reply as soon as possible. We welcome replies from other students, and may "endorse" a student's response instead of composing an instructor's response.

If you have a question or comment that will not be helpful to other students, or if you must reveal parts of your work to express your question or comment adequately, then you should post it privately to the appropriate preceptor on Piazza.

Final note

Please do not publish solutions to programming assignments in a way that could compromise their utility as pedagogical tools. At Princeton, this is a violation of the basic rights, rules and responsibilities of members of the university community.

Books:

Reference Books

There is no textbook for this class. The main reference book and reading assingment will be mainly based on:

[Sz] Szeliski, Computer Vision: Algorithms and Applications, Springer, 2010 (Free Online Draft Available Here)

Other references:

Vision:

[HZ] Hartley and Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2004
[FP] Forsyth and Ponce, Computer Vision: A Modern Approach, Prentice Hall, 2002
[Pa] Palmer, Vision Science, MIT Press, 1999

Learning:

[BGC] Bengio, Goodfellow and Courville, Deep Learning, MIT Press, 2014
[Mi] Mitchel, Machine Learning, McGraw-Hill, 1997
[DHS] Duda, Hart and Stork, Pattern Classification (2nd Edition), Wiley-Interscience, 2000

Graphical models:

[KF] Koller and Friedman, Probabilistic Graphical Models: Principles and Techniques, MIT Press, 2009

Resources:

Related Courses:

Computer Vision Class at Princeton

By Antonio Torralba at MIT:

By Alyosha Efros at CMU/Berkeley:

By James Hays at Brown:

By Noah Snavely at Cornell:

By Steven Seitz at UW:

Computer Vision

By others:

Deep Learning Tutorial, by Yoshua Bengio
Advanced Vision, by Bob Fisher
Deep Learning Summer School 2012
Multiple View Geometry, by Marc Pollefeys
Geometry, by Andrew Zisserman
Recognizing People, Objects and Actions, by Jitendra Malik
Introduction to Computer Vision, by Michael Black
Computer Vision, by Kristen Grauman
Computer Vision, by Rob Fergus
Introduction to Computer Vision, by Fei-Fei Li
The Computer Vision Industry

Code and Datasets

SUN database
SUN360 panorama database
Scene Classification Benchmark
The Steerable Pyramid
DrawMe: a light-weight Javascript library for line drawing on a picture.
Structural SVM
Template Matching
Representation and Synthesis of Visual Texture, Portilla & Simoncelli
Berkeley Segmentation
Pb
Superpixels
Structure from Motion for Unordered Image Collections
Peter Kovesi's Functions for Computer Vision
SIFT implementation by Andrea Vedaldi
Affine Covariant Features
A simple object detector with boosting
OpenCV

Resource collection by Bob Fisher

Video lectures - 15 hours of video in c. 10 minute blocks.

Including PDF slides, links to supplementary reading, a drill question for each video The site contains a set of video lectures on a subset of computer vision. It is intended for viewers who have an understanding of the nature of images and some understanding of how they can be processed. The course is more like Computer Vision 102, introducing a range of standard and acccepted methods, rather than the latest research advances.

CVonline is a free WWW-based set of introductions to topics in computer vision.

Because of the improvements in the content available in Wikipedia, it is now possible to find content for more than 50% of CVonline's 2000 topics. CVonline groups together the topics into a sensible topic hierarchy, but tries to exploit the advancing quality and breadth of wikipedia's content.

CVonline has a variety of supplemental information useful to students and researchers, namely lists of:

The education resources of the Int. Assoc. for Pattern Recognition

contain many links to Tutorials and Surveys, Explanations, Online Demos, Datasets, Books, Code for: Symbolic pattern recognition, Statistical pattern recognition, Machine learning, 1D Signal pattern recognition and 2D Image analysis and computer vision.

HIPR2: free WWW-based Image Processing Teaching Materials with JAVA

HIPR2 is a free www-based set of tutorial materials for the 50 most commonly used image processing operators. It contains tutorial text, sample results and JAVA demonstrations of individual operators and collections.

CVDICT: Dictionary of Computer Vision and Image Processing

This are the free view terms A..G from the the first version of the Dictionary, published by John Wiley and Sons. (Note there there a second edition currently on sale).