CAP6412: Advanced Computer Vision (Spring 2016)



Syllabus        Weekly schedule        Papers       Resources    


Lecture time: Tuesday and Thursday, 3:00pm--4:15pm

Lecture place: HEC 0117

Instructor: Boqing Gong
Email: bgong @ crcv.ucf.edu (Please put
[CAP6412] in the subject line when you email me.)
Office: HEC 2014
Office hours: Tuesday 4:30--5:30pm & by appointment

Course overview:

This is an advanced course in computer vision. We will examine some central topics and key techniques in computer vision, mainly through reading, writing reviews on, presenting, and discussing the most recent and/or classic papers published on top-notch computer vision conferences and journals. Candidate topics are appended below.



The main goal of the course is to prepare students for graduate research in computer vision. Through the class, the students are expected to understand in-depth the state-of-the-art approaches to the topics to be selected jointly by the students and the instructor. In addition to better domain knowledge in computer vision by the end of this course, the students will also develop the skills that are vital to their graduate research, such as writing paper reviews, presenting technical papers, analyzing the strengths and weaknesses of the research papers, and potentially identifying open questions and directions for future research.




Late Homework Policy:

Each student will have three late days in total for all the reports and projects. No additional late days are allowed.

Weekly schedule:

Date
Topics
Papers and links
Presenters   
Items due
01/12
Course introduction
Template for paper review
Boqing Gong
[Slides]
                                
01/14
Fundamentals of CNN Fareeha Irfan
[Slides]
Preferred topics due
01/14, 1:00pm

Sign up here for presentations
01/19
CNN & object recognition
  • {Major} [152 layers] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition.” arXiv preprint arXiv:1512.03385 (2015).
  • {Secondary} [ILSVRC] Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. “Imagenet large scale visual recognition challenge.” International Journal of Computer Vision (2014): 1-42.
Dustin Morley
[Slides]

01/21
Understanding CNN
  • {Major} [Visualization] Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” In Computer Vision–ECCV 2014, pp. 818-833. Springer International Publishing, 2014.
  • {Secondary} Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. “Object detectors emerge in deep scene cnns.” arXiv preprint arXiv:1412.6856 (2014).
Jason Tiller
[Slides]
Paper review of [Visualization] due
01/21, 3pm
01/26
Detection proposals
  • J. Hosang, R. Benenson, and B. Schiele. How good are detection proposals, really? BMVC 2014.

    {Major} [Detection Proposals] J. Hosang, R. Benenson, P. Dollár, and B. Schiele. What makes for effective detection proposals? PAMI 2015.

    {Major} [Faster R-CNN] Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
Samer Iskander
[Slides]
Paper review of [Detection Proposals]
due 01/26, 12pm
01/28
R-CNN
  • {Major} [R-CNN] Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jagannath Malik. "Rich feature hierarchies for accurate object detection and semantic segmentation." In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 580-587. IEEE, 2014.

    [Fast R-CNN] Girshick, Ross. "Fast R-CNN." arXiv preprint arXiv:1504.08083 (2015).
Syed Ahmed
[Slides]

02/02
Image captioning
  • {Major} [Image captioning] Karpathy, Andrej, and Li Fei-Fei. “Deep visual-semantic alignments for generating image descriptions.” arXiv preprint arXiv:1412.2306 (2014).
Mao, Junhua, Wei Xu, Yi Yang, Jiang Wang, and Alan L. Yuille. “Explain images with multimodal recurrent neural networks.” arXiv preprint arXiv:1410.1090 (2014).

Donahue, Jeff, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. “Long-term recurrent convolutional networks for visual recognition and description.” arXiv preprint arXiv:1411.4389 (2014).

Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. “Show and tell: A neural image caption generator.” arXiv preprint arXiv:1411.4555 (2014).

Lebret, Rémi, Pedro O. Pinheiro, and Ronan Collobert. “Phrase-based image captioning.” arXiv preprint arXiv:1502.03671 (2015).

Chen, Xinlei, and C. Lawrence Zitnick. “Mind’s eye: A recurrent visual representation for image caption generation.” Neural computation 9, no. 8 (1997): 1735-1780.

Kiros, R., Salakhutdinov, R. and Zemel, R.S., 2014. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539.
Harish Ravi Prakash
[Slides]
Paper review of [Image captioning]
due 02/02, 12pm

Project I posted!
Due: 02/28, 11:59pm
02/04
Attention modeling
  • {Major} [Attention modeling] Xu, Kelvin, Jimmy Ba, Ryan Kiros, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. “Show, attend and tell: Neural image caption generation with visual attention.” arXiv preprint arXiv:1502.03044 (2015).
Karan Daei-Mojdehi
[Slides]

Paper review of [Attention Modeling]
due 02/04, 12pm
02/09
Low-level vision:
Super-resolution

{Major} [Super-resolution] Dong, Chao, Chen Change Loy, Kaiming He, and Xiaoou Tang. “Learning a deep convolutional network for image super-resolution.” In Computer Vision–ECCV 2014, pp. 184-199. Springer International Publishing, 2014. (See the extended version on ArXiv: http://arxiv.org/pdf/1501.00092.pdf)

Riegler, Gernot, Samuel Schulter, Matthias Ruther, and Horst Bischof. "Conditioned Regression Models for Non-Blind Single Image Super-Resolution." In Proceedings of the IEEE International Conference on Computer Vision, pp. 522-530. 2015.

Liao, Renjie, Xin Tao, Ruiyu Li, Ziyang Ma, and Jiaya Jia. "Video Super-Resolution via Deep Draft-Ensemble Learning." In Proceedings of the IEEE International Conference on Computer Vision, pp. 531-539. 2015.

Jose Sanchez
[Slides]
Paper review of [Super-resolution]
due 02/09, 12pm
02/11
Low-level vision:
Edge detection

{Major} [Edge detection] Xie, Saining, and Zhuowen Tu. “Holistically-Nested Edge Detection.” In Proceedings of the IEEE International Conference on Computer Vision, 2015.

Liu, Ziwei, Xiaoxiao Li, Ping Luo, Chen-Change Loy, and Xiaoou Tang. "Semantic image segmentation via deep parsing network." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1377-1385. 2015.

Yu, Yizhou, Chaowei Fang, and Zicheng Liao. "Piecewise Flat Embedding for Image Segmentation." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1368-1376. 2015.

Goran Igic
[Slides]
The deadline for discussing Option 2 of Project 1: 02/11
02/16
Optical flow
{Major} [Optical flow] Fischer, Philipp, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser,
Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers,
and Thomas Brox. "Flownet: Learning optical flow with convolutional
networks." arXiv preprint arXiv:1504.06852 (2015).

Fleet, David, and Yair Weiss. "Optical flow estimation." In Handbook of mathematical models in computer vision, pp. 237-257. Springer US, 2006.

Revaud, Jerome, Philippe Weinzaepfel, Zaid Harchaoui, and Cordelia Schmid. "EpicFlow: Edge-preserving interpolation of correspondences for optical flow." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1164-1172. 2015.

Abdullah Jamal
[Slides]
Algorithm sketch of [Optical flow] due
02/16, 12pm

02/18
Pose estimation

{Major} [Pose estimation] Pfister, Tomas, James Charles, and Andrew Zisserman.
“Flowing convnets for human pose estimation in videos.” In Proceedings
of the IEEE International Conference on Computer Vision, pp. 1913-1921.
2015.

Zhang, Dong, and Mubarak Shah. "Human Pose Estimation in Videos." In Proceedings of the IEEE International Conference on Computer Vision, pp. 2012-2020. 2015.

Seguin, Guillaume, Karteek Alahari, Josef Sivic, and Ivan Laptev. "Pose estimation and segmentation of multiple people in stereoscopic movies." Pattern Analysis and Machine Intelligence, IEEE Transactions on 37, no. 8 (2015): 1643-1655.

Amar Kelu Nair
[Slides]
Algorithm sketch of [Pose estimation] due
02/18, 12pm
02/23
Visual question answering

{Major} [VQA-1] Antol, Stanislaw, AishwaryaAgrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. “VQA: Visual Question Answering.” arXiv preprint arXiv:1505.00468 (2015).

Gao, Haoyuan, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, and Wei Xu. "Are you talking to a machine? dataset and methods for multilingual image question answering." arXiv preprint arXiv:1505.05612 (2015).

Suhas Nithyanandappa
[Slides]

02/25
Visual question answering

{Major} [VQA-2] Malinowski, Mateusz, and Mario Fritz. “A multi-world approach to question answering about real-world scenes based on uncertain input.” In Advances in Neural Information Processing Systems, pp. 1682-1690. 2014.

Lin, Xiao, and Devi Parikh. "Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2984-2993. 2015.

Nandakishore Puttashamachar
[Slides]
Project I due on
02/28, 11:59pm
03/01
Visual question answering
{Major} [Relation Phrases] Sadeghi, Fereshteh, Santosh K. Divvala, and Ali Farhadi. “VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1456-1464. 2015.

Sadeghi, Fereshteh, C. Lawrence Zitnick, and Ali Farhadi. "VISALOGY: Answering Visual Analogy Questions." In Advances in Neural Information Processing Systems, pp. 1873-1881. 2015.

Javier Lores
[Slides]

03/03
OCR in the wild

{Major} [OCR in the wild] Jaderberg, Max, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. "Reading text in the wild with convolutional neural networks." International Journal of Computer Vision 116, no. 1 (2016): 1-20.

Zhu, Yingying, Cong Yao, and Xiang Bai. "Scene text detection and recognition: Recent advances and future trends." Frontiers of Computer Science 10, no. 1 (2016): 19-36.

Aisha Urooji
[Slides]

03/08
Spring break



03/10
Spring break


03/15
Aligning books with movies
[Book2Movie] Tapaswi, Makarand, Martin Bauml, and Rainer Stiefelhagen. "Book2movie: Aligning video scenes with book chapters." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1827-1835. 2015.

Zhu, Yukun, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. "Aligning books and movies: Towards story-like visual explanations by watching movies and reading books." In Proceedings of the IEEE International Conference on Computer Vision, pp. 19-27. 2015.

Fareeha Irfan
[Slides]
Algorithm sketch of [Book2Movie] due
03/15, 12pm
03/17
Visual Genome 1

[Visual Genome] Krishna, Ranjay, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen et al. "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations." arXiv preprint arXiv:1602.07332 (2016).

shreyas somashekar
[Slides]

03/22
DAG-CNN
Visual Genome] Krishna, Ranjay, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen et al. "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations." arXiv preprint arXiv:1602.07332 (2016).

[DAG-CNN] Yang, Songfan, and Deva Ramanan. "Multi-scale recognition with DAG-CNNs." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1215-1223. 2015.

shreyas somashekar &
Niladri Basu Bal
[Slides]
Paper review of
[Visual Genome] due
03/22, 12pm
03/24
Transfer learning
[Transferability] Yosinski, Jason, Jeff Clune, Yoshua Bengio, and Hod Lipson. “How transferable are features in deep neural networks?.” In Advances in Neural Information Processing Systems, pp. 3320-3328. 2014.

Mert Ozerdem
[Slides]
Project II posted!
Due: 04/26, 11:59pm
03/29
Biomedical imaging

Sirinukunwattana, Korsuk, Shan-e-Ahmed Raza, Yee-Wah Tsang, David Snead, Ian A. Cree, and Nasir Mahmood Rajpoot. "Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images." IEEE transactions on medical imaging (2016).

Dustin Morley
[Slides]

03/31
3D computer vision

Liu, Fayao, Chunhua Shen, Guosheng Lin, and Ian Reid. "Learning depth from single monocular images using deep convolutional neural fields." (2015).

Karan Daei-Mojdehi
[Slides]

04/05
Visual attributes

Liang, Kongming, Hong Chang, Shiguang Shan, and Xilin Chen. "A Unified Multiplicative Framework for Attribute Learning." In Proceedings of the IEEE International Conference on Computer Vision, pp. 2506-2514. 2015.

Abdullah Jamal
[Slides]

04/07
Visual attributes

Patrick Sudowe, Hannah Spitzer, Bastian Leibe. "Person Attribute Recognition with a Jointly-trained Holistic CNN Model." In ChaLearn Looking at People Workshop at ICCV'15

David Hill &
Samer Iskander
[Slides]
[Keras LSTM Demo Code]

04/12
Visual speech recognition

Sui, Chao, Mohammed Bennamoun, and Roberto Togneri. "Listening With Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines." In Proceedings of the IEEE International Conference on Computer Vision, pp. 154-162. 2015.

Mahdi Kalayeh & Javier Lores
[Slides]
[Chainer LSTM Demo Code, Data available upon request]

04/14
Action

Oh, Junhyuk, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, and Satinder Singh. "Action-conditional video prediction using deep networks in atari games." In Advances in Neural Information Processing Systems, pp. 2845-2853. 2015.

Harish RaviPrakash
[Slides]

04/19
Egocentric videos

Li, Yin, Zhefan Ye, and James M. Rehg. "Delving into egocentric actions." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 287-295. 2015.

Suhas Nithyanand
[Slides]

04/21
Egocentric videos

Bambach, Sven and Lee, Stefan and Crandall, David and Yu, Chen. Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions, ICCV 2015.

Aisha Urooj
[Slides]

04/26
Deep forest

Kontschieder, Peter, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. "Deep Neural Decision Forests." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1467-1475. 2015.

Fareeha Irfan
[Slides]
Project II
Due: 04/27, 05:00 PM
04/28



Project II presentation, 1:00pm—3:50pm