Visão Computacional: mudanças entre as edições

De Pontão Nós Digitais
Ir para navegaçãoIr para pesquisar
(prev years)
 
(22 revisões intermediárias pelo mesmo usuário não estão sendo mostradas)
Linha 1: Linha 1:
Esta é a pagina principal do curso de visão computacional de graduacao ministrado em 2018/2 no [http://pt.wikipedia.org/wiki/IPRJ IPRJ]/[http://pt.wikipedia.org/wiki/IPRJ UERJ], de utilidade geral para a formação de programadores de nivel intermediario e avancado. Este ano focaremos no desenvolvimento de software 3D (como aquele utilizado no Kinect, [https://www.microsoft.com/en-us/hololens Hololens]), Escaneamento no Google Streetview, Glass, dentre outros.
This is the homepage for the computer vision course at the undergraduate level being taught in 2018/2 at [http://pt.wikipedia.org/wiki/IPRJ IPRJ]/[http://pt.wikipedia.org/wiki/IPRJ UERJ]. This year will focus on programming 3D technology useful for robotics, motion capture and augmented reality applications such as Kinnect, [https://www.microsoft.com/en-us/hololens Hololens], 3D scanning in Google streetview, and Glass.
 
* Links to previous years of the course: [[Visão_Computacional_2013|2013]]


[[Imagem:Hologirl.jpeg|right|550px]]
[[Imagem:Hologirl.jpeg|right|550px]]
== Informacoes gerais ==
== General Information ==
* Instrutor: prof. [http://rfabbri.github.io Ricardo Fabbri], Ph.D.
* Instructor: prof. [http://rfabbri.github.io Ricardo Fabbri], Ph.D. Brown University
* Periodo: 1o. Semestre de 2018, voltado ao 10o. periodo de Engenharia da Computacao
* Period: Semester 2018/1 (ending August 2018), for Computer Engineering students in their 10th semester
* Tercas e Quintas, 3:10pm-5:00pm sala 207
* Tuesdays and Fridays, 3:10pm-5:00pm, room 207
* Forum for file exchange and discussion: google groups: iprj-visao-2018@googlegroups.com
* Chat: IRC #labmacambira for random chat


=== Pre-requisitos ===
=== Requisites ===


O aluno deverá saber conceitos básicos de programação e conseguir aprender linguagens rapidamente sob demanda durante o curso. Experiência previa com algebra linear e calculo vetorial altamente recomendados.
The student should know basic programming concepts and quickly learn languages on demand.
This year's focus on 3D puts demands some C++ programming, which will be
reviewed, but familiarity with C is required. Prior experience with linear
algebra and vector calculus are highly recommended, and will also be reviewed as needed.


=== Software ===
=== Software ===
Neste ano, focaremos em C++ (necessario para programacao 3D), juntamente com uma linguagem auxiliar como o Python, [[Scilab]]/Matlab, e [[Pd]].
This year will focus on practical C++ programming, since both performance and scalability are needed for the demands of 3D programming. Auxiliary languages will be used:
 
* Scripting languages such as Python, [[Scilab]] or Matlab for experimentation and prototyping small ideas
== Recursos principais ==
* [[Pd]] (pure data) will be used for real-time interactive apps (although the underlying code is C/C++)
* Grupo de discussao: email


=== Bibliografia ===
=== Bibliography ===
* Principal livro este ano (enfase 3D): [http://vision.ucla.edu/MASKS/ An Invitation to 3D Vision, Yi Ma, Stefano Soatto, Jana Kosecka e Shankar Sastry] http://vision.ucla.edu/MASKS/MASKS.jpg
* Main book this year (emphasizing 3D): [http://vision.ucla.edu/MASKS/ An Invitation to 3D Vision, Yi Ma, Stefano Soatto, Jana Kosecka e Shankar Sastry] http://vision.ucla.edu/MASKS/MASKS.jpg
* Segundo livro principal (enfase 3D): [http://www.robots.ox.ac.uk/~vgg/hzbook/ Multiple View Geometry in Computer Vision, Hartley & Zisserman] http://www.robots.ox.ac.uk/~vgg/hzbook/hzcover2.jpg
* Second main book (also emphasizing 3D): [http://www.robots.ox.ac.uk/~vgg/hzbook/ Multiple View Geometry in Computer Vision, Hartley & Zisserman] http://www.robots.ox.ac.uk/~vgg/hzbook/hzcover2.jpg
==== Livros Auxiliares ====
==== Auxiliary books ====
* Auxiliar 1: Computer Vision, Richard Szeliski, cujo pdf se encontra on-line: http://szeliski.org/Book
* Computer Vision, Richard Szeliski, which provides an online pdf http://szeliski.org/Book
http://szeliski.org/Book/imgs/SzeliskiBookFrontCover.png
http://szeliski.org/Book/imgs/SzeliskiBookFrontCover.png
* Auxiliar 2 2: Shape Classification and Analysis, 2nd. Ed., Luciano da Fontoura Costa & Roberto Marcondes Cesar Jr.
* Shape Classification and Analysis, 2nd. Ed., Luciano da Fontoura Costa & Roberto Marcondes Cesar Jr.
** Usaremos este livro para muito do material dado em aula


=== Links ===
=== Links ===
==== 3D Vision Courses ====
Finding a 3D vision course suitable for undergraduates is hard, just as the subject itself tends to be hard
* [https://www.cc.gatech.edu/~dellaert/09S-3D/Overview.html Super Cool 3D course] by Frank Dellaert. This is the same type of course as ours, except that we focused on getting structure from motion to work for most problems.
* [http://mesh.brown.edu/3DP/ 3D Photography Course at Brown University] by Gabriel Taubin. This vision course is similar to the present course, except it work later into the 3D vision pipeline, after we already have the 3D cameras and an initial reconstruction, how to get better reconstructions and photometry.
==== Other Vision & Related Courses ====
* Computer Vision research course at Brown Engineering, 2013 - [http://vision.lems.brown.edu/courses/engn2560/spring2013 ENGN 2560]
* Computer Vision research course at Brown Engineering, 2013 - [http://vision.lems.brown.edu/courses/engn2560/spring2013 ENGN 2560]
* Image Understanding course at Brown Engineering (basic to intermediate level)
* Image Understanding course at Brown Engineering (basic to intermediate level)
** 2013 course http://mesh.brown.edu/engn1610
** 2013 course http://mesh.brown.edu/engn1610
** 2011 course http://vision.lems.brown.edu/engn161/fall2011
** 2011 course http://vision.lems.brown.edu/engn161/fall2011
* Computer Vision course at Brown Computer Science (basic to intermediate) http://cs.brown.edu/courses/cs143
* [[SP|Stochastic processes course]] by prof. Fabbri
* [[PP|Parallel programming course]] by prof. Fabbri
* [[ALN|Numerical linear algebra course]] by prof. Fabbri: many useful concepts for 3D computer vision and graphics
* [[VC|Computer graphics course]] by prof. Fabbri
* [[PT|Pattern Theory course]] by prof. Fabbri: machine learning and relevant stochastic techniques for computer vision
* [[OpenCV]], [[VXL]], [[SIP]]: computer vision libraries in C/C++ and Scilab.


* Computer Vision course at Brown Computer Science (basic to intermediate) http://cs.brown.edu/courses/cs143
=== Exams ===
* [[PP|Curso de Computacao Paralela do]] [http://pt.wikipedia.org/wiki/IPRJ IPRJ]: incluindo CUDA
<b>P1:</b> 16Aug18 Signature / presence required for course
* [[ALN|Curso de Álgebra Linear Numérica do]] [http://pt.wikipedia.org/wiki/IPRJ IPRJ]: muitos conceitos úteis para visao computacional e computação gráfica
Final/Sub:
* [[VC|Curso de Computacao Grafica do]] [http://pt.wikipedia.org/wiki/IPRJ IPRJ]
* [[PT|Curso de Teoria dos Padroes do]] [http://pt.wikipedia.org/wiki/IPRJ IPRJ]: aprendizagem de maquina, tecnicas estocasticas, vertente importante de Visao Computacional
* [[OpenCV]], [[VXL]], [[SIP]]: bibliotecas de visao computacional em C/C++ e Scilab, respectivamente.


=== Provas ===
== Homework ==
=== Assignment 1 ===
* [[Media:L1-visao-2018.pdf|Exercise list for Soatto's book, chapter 2]]
* Due date: entregar no inicio da P1


== Tarefas ==
== Practice Labs ==
== Labs ==
=== Download ===
=== Download dos Labs ===
All lab material (images, pdf, etc) can be downloaded and updated through [[Git]] [https://github.com/rfabbri/vision-course]:
Todo material dos trabalhos (imagens, pdfs, etc) pode ser baixado e atualizado pelo [[Git]] [https://github.com/rfabbri/vision-course]:
   git clone https://github.com/rfabbri/vision-course.git
   git clone https://github.com/rfabbri/vision-course.git
* Use your language of choice to do the homework. I suggest Scilab, Matlab or Python. This year, most students have elected Python
* This year, since we'll be focusing on 3D, you only need to do Lab1 to get started:


=== Lab1: Processamento ponto a ponto ===
=== Lab1: Processamento ponto a ponto ===
* [https://github.com/rfabbri/vision-course/blob/master/lab1/lab1-visao-computacional-IPRJ-2013-v1.pdf?raw=true Enunciado]
* [https://github.com/rfabbri/vision-course/blob/master/lab1/lab1-visao-computacional-IPRJ-2013-v1.pdf?raw=true Enunciado]
* As imagens necessarias estao no repositorio [[Git]][https://github.com/rfabbri/vision-course/tree/master/lab1/figs]
* The necessary images are available at [[Git]][https://github.com/rfabbri/vision-course/tree/master/lab1/figs]
* Data de entrega: quarta-feira 13/Nov/2013 ate a meia-noite. Atrasos serao aceitos porem somente com reducao na nota.
* Due date: 9Ago18 until midnight. Atrasos serao aceitos porem somente com reducao na nota.


==== Instrucoes para Submissao ====
==== Instructions for submission ====
* A solucao deve ser <b>digitada</b> em um relatorio e entregue no formato <tt>pdf</tt>.
* A solucao deve ser <b>digitada</b> em um relatorio e entregue no formato <tt>pdf</tt>.
* Incluir tambem todo o codigo fonte e dados gerados
* Incluir tambem todo o codigo fonte e dados gerados
Linha 58: Linha 79:
  <sobrenome>-<nome>-visao-computacional-lab<numero_lab>.zip
  <sobrenome>-<nome>-visao-computacional-lab<numero_lab>.zip


== Projetos ==
== Projects ==


=== 3D Reconstruction for Robotics ===
=== 3D Reconstruction for Robotics ===
A robot playing robot soccer needs to measure its position within the soccer field,
reconize and track the position of other robots and a ball object relative to itself and the soccer field.
If we only want to use regular and cheap cameras for this process (robot eyes), how are we to proceed with the software?  We could attempt to use sparse structure from motion techniques, that are now widely available.
This has the advantage of robustness for tracking relative pose and provides some
geometry, as opposed to generating and stitching depth maps.
In this project, we will explore the use of sparse 3D point cloud technology as well as
curve-drawing technology for robotics.  Our basic setting, to start with, is a
robotic arm and a single webcam. The webcam reconstructs parts / points / curve
fragments of rigid objects from a video stream. It also determines its position
relative to the arm by looking at the arm moving in front of the camera and
reconstructing points or curves on it.  Finally, it expresses the 3D
information relative to the arm's coordinate system so that the arm can actuate
on the object (in robot soccer, this could be any action such as kicking a ball
or defending goal).
The main thing we need is a module that takes in a video stream of a moving
object, and outputs a 3D reconstruction relative to the first frame.  You can
think of an object being rotated in front of the camera, and a 3D model being
progressively built, as if the robot is manipulating the object to see all
angles and get a better picture of its structure.
This module of generating 3D from a video stream, while figuring out the relative position of the object at each frame, is a sparse 3D reconstruction and camera estimation system called structure from motion.
==== Part 1 ====
# To practice, [https://drive.google.com/open?id=1xfeO1wVeDUs0NoSsw8TT5Gwnp3cLxHo9 download the "Kermit the Frog" test dataset here] (copied from [http://www.cs.cornell.edu/~snavely/bundler/ Bundler]).
# Download [https://colmap.github.io COLMAP] 3D reconstruction and structure from motion software (use Linux), and also [Regard3D]
# Run a 3D reconstruction and visualize it.
## With COLMAP
### Go to Reconstruction -> Automatic Reconstruction
### Set the image folder as `kermit` (where the `.jpg` images are stored)
### Set the workspace folder as any folder (I create a folder `kermit-work` for that)
### Click "Run"
### Visualize the reconstruction. Go to Render -> Render options, and increase the point size
## Take some screenshots and put them on your report
## Play with some parameters
### Reconstruction -> Automatic Reconstruction -> Dense model (needs CUDA)
## With [http://www.regard3d.org/ Regard3D] (I've obtained nicer results more easily with Regard3D)
### Follow [http://www.regard3d.org/index.php/documentation/tutorial the tutorial]
## Generate and visualize a dense point cloud
## Export to meshlab and visualize the cameras and 3D reconstruction
# +1 Bonus point: generate your own dataset related to robot soccer and perform all steps above
# +1 Bonus point: Reconstruct/visualize the dataset using [http://ccwu.me/vsfm/ Visual SfM] or [https://www.blender.org/ Blender]
==== Part 2 ====
# Build LEMSVXD (copy the code from the professor)
# Compute curves for each view
# Visualize epipolar lines across many views
## Build GUI at `/Users/rfabbri/cprg/vxlprg/lemsvpe/lemsvxl/contrib/rfabbri/mw/app`
## Open the dataset on the GUI using the command `sg *`
## Start the Epipolar GUI tool
## Click on an image, and visualize the epipolar lines
## What do you think are the epipolar errors?
# Visualize the 3D reconstructed curves
# Perform 3D reconstruction by clicking on corresponding curves using the GUI
# Generate your own 3D curve-based reconstruction & visualize it
==== More information ====
* Whoever completes all steps will score a 10
* Bonus: whoever runs on own dataset will have a bonus of 3 points over the maximum project grade. For example, an object rotating in front of a robot.
* Due date: 20Ago18 para os que nao forem fazer prova final, ou data da P1 para os que quiserem fazer prova final.


=== Optional: Toy Piano ===
=== Optional: Toy Piano ===
Linha 70: Linha 154:
https://lh4.googleusercontent.com/-F2NVFCF2sgc/TXzcfLA39XI/AAAAAAAAARs/ty1fbFrCSac/s400/letter+mat.jpg
https://lh4.googleusercontent.com/-F2NVFCF2sgc/TXzcfLA39XI/AAAAAAAAARs/ty1fbFrCSac/s400/letter+mat.jpg


== Criterio de Avaliacao ==
== Evaluation Criteria ==
* Graduacao: Nota = 70% projetos e tarefas, 30% prova
* Grande = 70% projects and 30% tests
* Pos-graduacao: Nota = 100% projetos e tarefas
* '''Bonus:''' top 2 projetos que atingirem nivel de excelencia ganharao +3 pontos na media
* '''Bonus:''' top 2 projetos (que atingirem nivel de excelencia) ganharao +3 pontos na media


[[Category:IPRJ]] [[Category:Lab Macambira]]
[[Category:IPRJ]] [[Category:Lab Macambira]]

Edição atual tal como às 09h15min de 6 de agosto de 2018

This is the homepage for the computer vision course at the undergraduate level being taught in 2018/2 at IPRJ/UERJ. This year will focus on programming 3D technology useful for robotics, motion capture and augmented reality applications such as Kinnect, Hololens, 3D scanning in Google streetview, and Glass.

  • Links to previous years of the course: 2013
Hologirl.jpeg

General Information

  • Instructor: prof. Ricardo Fabbri, Ph.D. Brown University
  • Period: Semester 2018/1 (ending August 2018), for Computer Engineering students in their 10th semester
  • Tuesdays and Fridays, 3:10pm-5:00pm, room 207
  • Forum for file exchange and discussion: google groups: iprj-visao-2018@googlegroups.com
  • Chat: IRC #labmacambira for random chat

Requisites

The student should know basic programming concepts and quickly learn languages on demand. This year's focus on 3D puts demands some C++ programming, which will be reviewed, but familiarity with C is required. Prior experience with linear algebra and vector calculus are highly recommended, and will also be reviewed as needed.

Software

This year will focus on practical C++ programming, since both performance and scalability are needed for the demands of 3D programming. Auxiliary languages will be used:

  • Scripting languages such as Python, Scilab or Matlab for experimentation and prototyping small ideas
  • Pd (pure data) will be used for real-time interactive apps (although the underlying code is C/C++)

Bibliography

Auxiliary books

SzeliskiBookFrontCover.png

  • Shape Classification and Analysis, 2nd. Ed., Luciano da Fontoura Costa & Roberto Marcondes Cesar Jr.

Links

3D Vision Courses

Finding a 3D vision course suitable for undergraduates is hard, just as the subject itself tends to be hard

  • Super Cool 3D course by Frank Dellaert. This is the same type of course as ours, except that we focused on getting structure from motion to work for most problems.
  • 3D Photography Course at Brown University by Gabriel Taubin. This vision course is similar to the present course, except it work later into the 3D vision pipeline, after we already have the 3D cameras and an initial reconstruction, how to get better reconstructions and photometry.

Other Vision & Related Courses

Exams

P1: 16Aug18 Signature / presence required for course Final/Sub:

Homework

Assignment 1

Practice Labs

Download

All lab material (images, pdf, etc) can be downloaded and updated through Git [1]:

 git clone https://github.com/rfabbri/vision-course.git
  • Use your language of choice to do the homework. I suggest Scilab, Matlab or Python. This year, most students have elected Python
  • This year, since we'll be focusing on 3D, you only need to do Lab1 to get started:

Lab1: Processamento ponto a ponto

  • Enunciado
  • The necessary images are available at Git[2]
  • Due date: 9Ago18 until midnight. Atrasos serao aceitos porem somente com reducao na nota.

Instructions for submission

  • A solucao deve ser digitada em um relatorio e entregue no formato pdf.
  • Incluir tambem todo o codigo fonte e dados gerados
  • Enviar um arquivo zipado com tudo (scripts scilab, relatorio, etc) por email, no formato:
<sobrenome>-<nome>-visao-computacional-lab<numero_lab>.zip

Projects

3D Reconstruction for Robotics

A robot playing robot soccer needs to measure its position within the soccer field, reconize and track the position of other robots and a ball object relative to itself and the soccer field. If we only want to use regular and cheap cameras for this process (robot eyes), how are we to proceed with the software? We could attempt to use sparse structure from motion techniques, that are now widely available. This has the advantage of robustness for tracking relative pose and provides some geometry, as opposed to generating and stitching depth maps.

In this project, we will explore the use of sparse 3D point cloud technology as well as curve-drawing technology for robotics. Our basic setting, to start with, is a robotic arm and a single webcam. The webcam reconstructs parts / points / curve fragments of rigid objects from a video stream. It also determines its position relative to the arm by looking at the arm moving in front of the camera and reconstructing points or curves on it. Finally, it expresses the 3D information relative to the arm's coordinate system so that the arm can actuate on the object (in robot soccer, this could be any action such as kicking a ball or defending goal).

The main thing we need is a module that takes in a video stream of a moving object, and outputs a 3D reconstruction relative to the first frame. You can think of an object being rotated in front of the camera, and a 3D model being progressively built, as if the robot is manipulating the object to see all angles and get a better picture of its structure.

This module of generating 3D from a video stream, while figuring out the relative position of the object at each frame, is a sparse 3D reconstruction and camera estimation system called structure from motion.


Part 1

  1. To practice, download the "Kermit the Frog" test dataset here (copied from Bundler).
  2. Download COLMAP 3D reconstruction and structure from motion software (use Linux), and also [Regard3D]
  3. Run a 3D reconstruction and visualize it.
    1. With COLMAP
      1. Go to Reconstruction -> Automatic Reconstruction
      2. Set the image folder as `kermit` (where the `.jpg` images are stored)
      3. Set the workspace folder as any folder (I create a folder `kermit-work` for that)
      4. Click "Run"
      5. Visualize the reconstruction. Go to Render -> Render options, and increase the point size
    2. Take some screenshots and put them on your report
    3. Play with some parameters
      1. Reconstruction -> Automatic Reconstruction -> Dense model (needs CUDA)
    4. With Regard3D (I've obtained nicer results more easily with Regard3D)
      1. Follow the tutorial
    5. Generate and visualize a dense point cloud
    6. Export to meshlab and visualize the cameras and 3D reconstruction
  4. +1 Bonus point: generate your own dataset related to robot soccer and perform all steps above
  5. +1 Bonus point: Reconstruct/visualize the dataset using Visual SfM or Blender

Part 2

  1. Build LEMSVXD (copy the code from the professor)
  2. Compute curves for each view
  3. Visualize epipolar lines across many views
    1. Build GUI at `/Users/rfabbri/cprg/vxlprg/lemsvpe/lemsvxl/contrib/rfabbri/mw/app`
    2. Open the dataset on the GUI using the command `sg *`
    3. Start the Epipolar GUI tool
    4. Click on an image, and visualize the epipolar lines
    5. What do you think are the epipolar errors?
  4. Visualize the 3D reconstructed curves
  5. Perform 3D reconstruction by clicking on corresponding curves using the GUI
  6. Generate your own 3D curve-based reconstruction & visualize it

More information

  • Whoever completes all steps will score a 10
  • Bonus: whoever runs on own dataset will have a bonus of 3 points over the maximum project grade. For example, an object rotating in front of a robot.
  • Due date: 20Ago18 para os que nao forem fazer prova final, ou data da P1 para os que quiserem fazer prova final.

Optional: Toy Piano

Build an interactive toy piano or toy keyboard where you have colored objects representing each letter of the alphabet, and the user would touch each object and the computer would react to the persons' touch. To make this easy and cheap, this could work with a webcam and color detection, and the person/baby could use a black glove/sock.

letter+mat.jpg

Evaluation Criteria

  • Grande = 70% projects and 30% tests
  • Bonus: top 2 projetos que atingirem nivel de excelencia ganharao +3 pontos na media