On your one-minute walk from the coffee machine to your desk each morning, you pass by dozens of scenes – a kitchen, an elevator, your office – and you effortlessly recognize them and perceive their 3D structure. But this one-minute scene-understanding problem has been an open challenge in computer vision, since the field was first established 50 years ago. In this class, we will learn the state-of-the-art algorithms, and study how to build computer systems that automatically understand visual scenes, both inferring the semantics and extracting 3D structure.
This course requires programming experience as well as basic linear algebra. Previous knowledge of visual computing will be helpful.
Many of the following slides are modified from the excellent class notes of similar courses offered by Prof. Fredo Durand, Alexei Efros, Rob Fergus, William Freeman, Thomas Funkhouser, James Hays, Kristen Grauman, Svetlana Lazebnik, Fei-fei Li, Srinivasa Narasimhan, Aude Oliva, Szymon Rusinkiewicz, Silvio Savarese, Steve Seitz, Noah Snavely, Richard Szeliski, Antonio Torralba, Yair Weiss, and Li Zhang. We are extremely grateful to them.
The following schedule is preliminary and subject to change as the term evolves.
W | L | Date | Topic | Slide | Reading | Deadline |
---|---|---|---|---|---|---|
1 | 1 | Thu Sep 11 | How to save the world? | Introduction | Sz1 | |
2 | 2 | Tue Sep 16 | What is a camera? Accidental pinhole! | Image Formulation | Sz2 Acc | |
3 | Thu Sep 18 | What color is the sky? | Color | Sz2 | ||
Fri Sep 19 | Precept 1 [Friend Center 006 12:00-1:00] | Precept1 | ||||
3 | 4 | Tue Sep 23 | Let's test your beer goggles | Filtering | Sz3 | |
5 | Thu Sep 25 | Natural image statistics | Statistics | Sz10 | ||
4 | 6 | Tue Sep 30 | Let's make the world simpler. | Edges and Lines | PS1 due | |
7 | Thu Oct 2 | Where have all the flowers gone? (Andras Ferencz @ Mobileye) | Tracking | Sz8 | ||
5 | 8 | Tue Oct 7 | Superman vision: revealing invisible changes in the world | Motion | Sz8 Wig Mag Mic | |
9 | Thu Oct 9 | Life story of a 3D point | Geometry | Sz2 Sz7 | ||
Fri Oct 10 | Precept 2 [CS105 12:20-1:30] | Precept2 | ||||
6 | 10 | Tue Oct 14 | Life story of a 3D point (continued) | |||
11 | Thu Oct 16 | The fundamental matrix song | Multi-view Geometry | Sz2 Sz7 Sz4 Sz6 Sz9 | PS2 due | |
7 | 12 | Tue Oct 21 | The fundamental matrix song (continue) | |||
Wed Oct 22 | Precept 3 [COS105 12:30-13:30] | Precept3 | ||||
13 | Thu Oct 23 | Building rome in a day | Structure From Motion | Sz11 Sz12 | ||
8 | Tue Oct 28 | No class (Fall Recess) | ||||
Thu Oct 30 | No class (Fall Recess) | |||||
9 | 14 | Tue Nov 4 | Where is "the birth of venus"? | Image-based Modeling | Sz13 | PS3 due |
15 | Thu Nov 6 | What is a chair? | Object Detection | Sz14 | ||
10 | 16 | Tue Nov 11 | Guest lecture (Kevin Zhou @ Siemens) | Medical Image Parsing | ||
17 | Thu Nov 13 | Guest lecture (Kevin Zhou @ Siemens) | Medical Image Parsing | |||
11 | Mon Nov 17 | Precept 4 [CS105 12:20-1:30] | Precept4 | |||
18 | Tue Nov 18 | Using the forest to see the trees | Scene Understanding | Sz14 | ||
19 | Thu Nov 20 | Using the forest to see the trees (continued) | ||||
12 | 20 | Tue Nov 25 | The unreasonable effectiveness of big visual data | Data-driven Brute-force | PS4 due | |
Thu Nov 27 | No class (Thanksgiving Recess) | |||||
13 | 21 | Tue Dec 2 | RGB-D object recognition (Shuran Song) | RGB-D Recognition | Sliding Shapes | |
Wed Dec 3 | Precept 5 [CS105 12:20-1:30] | Precept5 | ||||
22 | Thu Dec 4 | Final project proposal presentation | Proposal due | |||
14 | 23 | Tue Dec 9 | Shape Matching (Prof. Tom Funkhouser) | Shape Matching | ||
24 | Thu Dec 11 | This is how the brain really works + Quiz | Deep Learning | PS5 due | ||
Tue Jan 13 | Final report due |
We will use Matlab as our programming language in this class, which is very simple and you should be able to pick it up in 1-2 hours. Princeton OIT provides Matlab for installation (Please install the latest version R2014a with all available toolboxes.) If you have no experience in Matlab, please attend the following Matlab training workshop by Keller Center (Registration Required).
We expect the report to be between 2-8 pages. If you have more members in your team, we expect your report to be longer. Your report should be written using the CVPR latex template and compiled into a PDF file. (downloaded from http://www.pamitc.org/cvpr15/files/cvpr2015AuthorKit.zip).
Here is a guideline for the report writing:
1. Define the problem:
What is the input?
What is the output?
2. Motivation:
Why do this?
What makes it difficult?
3. Related work:
What has been done? What are the problems?
4. Algorithm:
What did you try? What are the alternatives? Why you choose to try
this? Justify your design decision.
5. Result:
Visualization of your result: screen capture, plot, example outputs.
Add figures into your report.
6. Evaluation:
Does it work? How well it works?
7. Analysis:
Why it works or doesn't work?
Can we make it better? How?
8. [For 2 or 3 person teams] Contribution division: Who did what?
The COS 429 assignment collaboration policy is derived from that of Princeton's COS 217 ...
Programming is an individual creative process much like composition. You must reach your own understanding of the problem and discover a path to its solution. During this time, discussions with other people are permitted and encouraged. However, when the time comes to write code that solves the problem, such discussions (except with course staff members) are no longer appropriate: the code must be your own work. If you have a question about how to use some feature of C, Unix, etc., you certainly can ask your friends or the teaching assistants, but specific questions about code you have written must be treated more carefully.
For each assignment you must specifically state, in your writeup
file, the names of any individuals from whom you received help, and the nature of the help that you received. That includes help from friends, classmates, lab TAs, course staff members, etc.
Do not, under any circumstances, copy another person's code. Incorporating someone else's code into your code in any form is a violation of academic regulations. This includes adapting solutions or partial solutions to assignments from any offering of this course or any other course. There is one exception to the code-sharing rule: You may adapt code from the COS 429 course materials provided that you explain what code you use, and cite its source in your writeup
file.
Copying and transforming someone else's code (by rearranging independent code, renaming variables, rewording comments, etc.) is plagiarism. Some inexperienced programmers have the misconception that detecting such plagiarism is difficult. Actually, detecting such plagiarism is quite easy. Not only does such plagiarism quickly identify itself during the grading process, but also we can (and do) use software packages, such as Alex Aiken's renowned MOSS software, for automated help.
If we suspect a student of plagiarism on an assignment, then we will refer the case to the Committee on Discipline. If the Committee on Discipline finds the student guilty of plagiarism, then the standard penalty is automatic failure of the COS 429 course. The Committee on Discipline may impose additional penalties.
For each assignment you must specifically state, in your writeup
file, the names of any individuals to whom you provided help, and the nature of the help that you provided.
Abetting plagiarism or unauthorized collaboration by "sharing" your code is prohibited. Sharing code in digital form is an especially egregious violation. Do not e-mail your code or make your code available to anyone. Do not share your code with anyone even after the due date/time of the assignment.
You are responsible for keeping your solutions to the COS 429 programming assignments away from prying eyes. If someone else copies your code, we have no way to determine who is the owner and who is the copier; the Committee on Discipline decides. If you are working on a public cluster computer, make sure that you do not leave the computer unattended, and that you delete your local files and logout before leaving.
You should store all of your assignment files in a private directory. You can create a private directory using commands similar to these:
$ mkdir cos429 $ chmod 700 cos429
If you have a question or comment that will be helpful to other students, and you need not reveal any parts of your work to express the question or comment properly, then you should post it to the course's Piazza page. One of the course's instructors will reply as soon as possible. We welcome replies from other students, and may "endorse" a student's response instead of composing an instructor's response.
If you have a question or comment that will not be helpful to other students, or if you must reveal parts of your work to express your question or comment adequately, then you should post it privately to the appropriate preceptor on Piazza.
There is no textbook for this class. The main reference book and reading assingment will be mainly based on:
Other references:
Vision:
Learning:
Graphical models:
Computer Vision Class at Princeton
By Antonio Torralba at MIT:
By Alyosha Efros at CMU/Berkeley:
By James Hays at Brown:
By Noah Snavely at Cornell:
By Steven Seitz at UW:
By others:
Including PDF slides, links to supplementary reading, a drill question for each video The site contains a set of video lectures on a subset of computer vision. It is intended for viewers who have an understanding of the nature of images and some understanding of how they can be processed. The course is more like Computer Vision 102, introducing a range of standard and acccepted methods, rather than the latest research advances.
Because of the improvements in the content available in Wikipedia, it is now possible to find content for more than 50% of CVonline's 2000 topics. CVonline groups together the topics into a sensible topic hierarchy, but tries to exploit the advancing quality and breadth of wikipedia's content.
contain many links to Tutorials and Surveys, Explanations, Online Demos, Datasets, Books, Code for: Symbolic pattern recognition, Statistical pattern recognition, Machine learning, 1D Signal pattern recognition and 2D Image analysis and computer vision.
HIPR2 is a free www-based set of tutorial materials for the 50 most commonly used image processing operators. It contains tutorial text, sample results and JAVA demonstrations of individual operators and collections.
This are the free view terms A..G from the the first version of the Dictionary, published by John Wiley and Sons. (Note there there a second edition currently on sale).