Visão Computacional: mudanças entre as edições

De Pontão Nós Digitais
Ir para navegaçãoIr para pesquisar
(prev years)
 
(40 revisões intermediárias por 2 usuários não estão sendo mostradas)
Linha 1: Linha 1:
Esta é a pagina principal do curso de visão computacional de graduacao e pos-graduacao ministrado em 2013/2 no [http://pt.wikipedia.org/wiki/IPRJ IPRJ]/[http://pt.wikipedia.org/wiki/IPRJ UERJ], de utilidade geral para a formação de programadores de nivel intermediario e avancado para desenvolvimento de software multimídia, aplicativos interativos, dentre outros.
This is the homepage for the computer vision course at the undergraduate level being taught in 2018/2 at [http://pt.wikipedia.org/wiki/IPRJ IPRJ]/[http://pt.wikipedia.org/wiki/IPRJ UERJ]. This year will focus on programming 3D technology useful for robotics, motion capture and augmented reality applications such as Kinnect, [https://www.microsoft.com/en-us/hololens Hololens], 3D scanning in Google streetview, and Glass.
[[Imagem:Terminator.jpg|right|550px]]
== Informacoes gerais ==
* Instrutor: prof. [http://www.lems.brown.edu/~rfabbri Ricardo Fabbri], Ph.D.
* Periodo: 2o. Semestre de 2013, voltado ao 10o. periodo de Engenharia da Computacao
* Tercas 2:20pm-4:00pm sala 211 e Quintas 1:20pm-3:10pm, sala 212. Alguns dias no Lab Inf xx


=== Pre-requisitos ===
* Links to previous years of the course: [[Visão_Computacional_2013|2013]]


O aluno deverá saber conceitos básicos de programação e conseguir aprender linguagens rapidamente sob demanda durante o curso. Experiência previa com algebra linear e calculo vetorial altamente recomendados.
[[Imagem:Hologirl.jpeg|right|550px]]
=== Software ===
== General Information ==
A linguagem a ser utilizada para aprendizado sera o [[Scilab]], C++, e [[Pd]].
* Instructor: prof. [http://rfabbri.github.io Ricardo Fabbri], Ph.D. Brown University
* Period: Semester 2018/1 (ending August 2018), for Computer Engineering students in their 10th semester
* Tuesdays and Fridays, 3:10pm-5:00pm, room 207
* Forum for file exchange and discussion: google groups: iprj-visao-2018@googlegroups.com
* Chat: IRC #labmacambira for random chat
 
=== Requisites ===


== Conteudo aproximado ==
The student should know basic programming concepts and quickly learn languages on demand.
* Introducao, processamento de imagens basico, ajustes, morfologia matematica
This year's focus on 3D puts demands some C++ programming, which will be
* Filtros, Transformadas de Fourier, Piramides e Wavelets
reviewed, but familiarity with C is required. Prior experience with linear
* Deteccao e Ligacao de bordas e pontos de interesse
algebra and vector calculus are highly recommended, and will also be reviewed as needed.
* Cores e Iluminacao
* Reconstrucao 3D e Auto-calibracao esparsa de Cameras
** Geometria de visao estereo bifocal, trifocal e multifocal
* Fluxo optico e estimacao de movimento 3D denso e nao-rigido
* Reconhecimento, Aprendizado de Maquina, Manifold Learning


== Recursos principais ==
=== Software ===
* Grupo de discussao: [http://uerj.tk uerj.tk]
This year will focus on practical C++ programming, since both performance and scalability are needed for the demands of 3D programming. Auxiliary languages will be used:
* Scripting languages such as Python, [[Scilab]] or Matlab for experimentation and prototyping small ideas
* [[Pd]] (pure data) will be used for real-time interactive apps (although the underlying code is C/C++)


=== Bibliografia ===
=== Bibliography ===
* Principal 1: Computer Vision, Richard Szeliski, cujo pdf se encontra on-line: http://szeliski.org/Book
* Main book this year (emphasizing 3D): [http://vision.ucla.edu/MASKS/ An Invitation to 3D Vision, Yi Ma, Stefano Soatto, Jana Kosecka e Shankar Sastry] http://vision.ucla.edu/MASKS/MASKS.jpg
* Second main book (also emphasizing 3D): [http://www.robots.ox.ac.uk/~vgg/hzbook/ Multiple View Geometry in Computer Vision, Hartley & Zisserman] http://www.robots.ox.ac.uk/~vgg/hzbook/hzcover2.jpg
==== Auxiliary books ====
* Computer Vision, Richard Szeliski, which provides an online pdf http://szeliski.org/Book
http://szeliski.org/Book/imgs/SzeliskiBookFrontCover.png
http://szeliski.org/Book/imgs/SzeliskiBookFrontCover.png
* Principal 2: Shape Classification and Analysis, 2nd. Ed., Luciano da Fontoura Costa & Roberto Marcondes Cesar Jr.
* Shape Classification and Analysis, 2nd. Ed., Luciano da Fontoura Costa & Roberto Marcondes Cesar Jr.
** Usaremos este livro para muito do material dado em aula
http://books.gigaimg.com/avaxhome/65/3e/000d3e65_medium.jpeg


=== Aulas ===
=== Links ===
Um video sobre aspectos interessantes da tecnologia sera apresentado a cada aula.


* Introducao - videos, demos, formato do curso
==== 3D Vision Courses ====
* Aula 2 - conceitos fundamentais e notacao
Finding a 3D vision course suitable for undergraduates is hard, just as the subject itself tends to be hard
** inspiring video: http://youtu.be/Oie1ZXWceqM
* [https://www.cc.gatech.edu/~dellaert/09S-3D/Overview.html Super Cool 3D course] by Frank Dellaert. This is the same type of course as ours, except that we focused on getting structure from motion to work for most problems.
* Equalizacao de Histogramas e Outras Tecnicas de Realce - 10/out/2013
* [http://mesh.brown.edu/3DP/ 3D Photography Course at Brown University] by Gabriel Taubin. This vision course is similar to the present course, except it work later into the 3D vision pipeline, after we already have the 3D cameras and an initial reconstruction, how to get better reconstructions and photometry.
** inspiring video: https://vimeo.com/36239715
* Filtragem de Imagens no dominio espacial, parte 1
* Filtragem de Imagens no dominio da frequencia, transformada de Fourier, aliasing ([http://www.lems.brown.edu/~rfabbri/stuff/vision-course/aula-fourier-rfabbri.pdf pdf |] [http://www.lems.brown.edu/~rfabbri/stuff/vision-course/aula-fourier-rfabbri.key keynote])
** inspiring video: water spiral vs framerate
* Morfologia Matematica, Geometria Discreta, Componentes Conexos, Transformadas de Distancia, Esqueletizacao, Diagramas de Voronoi ([http://www.lems.brown.edu/~rfabbri/stuff/vision-course/aula-morfologia.pdf pdf] | [http://www.lems.brown.edu/~rfabbri/stuff/vision-course/aula-morfologia.pdf keynote]) - 16/out/2013
* Pontos de Interesse - ate 5/nov/2013
** Notas de aula sobre detector e descritor SIFT (feito `a mao)
*** tambem foi seguido partes do slide[http://www.lems.brown.edu/~rfabbri/stuff/vision-course/aula-local-image-features-hays.pdf]
** [http://www.lems.brown.edu/~rfabbri/stuff/vision-course/aula-interest-points-and-corners-hays.pdf  Harris corners and related ideas]  
* [http://www.lems.brown.edu/~rfabbri/stuff/vision-course/aula-matching-and-robust-fitting-hays.pdf Ajuste robusto: Hough Transform, Robust least squares, basico de RANSAC]
* [http://cs.brown.edu/courses/csci1430/lectures/11.pdf RANSAC e ajuste robusto de modelos]
* Cores, Luz, Fisica e Percepcao ([http://www.lems.brown.edu/~rfabbri/stuff/graphics-course/aula-percepcao-cor.pdf pdf] | [http://www.lems.brown.edu/~rfabbri/stuff/graphics-course/aula-percepcao-keynote.zip keynote]) - mais de 100MB
** Aprofundamento: colorimetria, fotometria, iluminacao avancada
** Calibracao de cores, white balance, filtros, codificacao de imagens 3D, highlights, fenomenos naturais de cor
** Leitura - Sobre temperatura de luzes que nao emitem como corpo negro [http://lowel.com/edu/color_temperature_and_rendering_demystified.html]
** Leitura interessante - a cor media das estrelas do universo [http://www.pha.jhu.edu/~kgb/cosspec/]
* Modelo de geracao de imagens RAW/JPEG em cameras reais para renderizacao realistica: [http://vision.middlebury.edu/color/ Estudo de Zickler et. al.] (Dez/13)
* '''em andamento'''


== Recursos adicionais ==
==== Other Vision & Related Courses ====
 
 
=== Links ===
* Computer Vision research course at Brown Engineering, 2013 - [http://vision.lems.brown.edu/courses/engn2560/spring2013 ENGN 2560]
* Computer Vision research course at Brown Engineering, 2013 - [http://vision.lems.brown.edu/courses/engn2560/spring2013 ENGN 2560]
* Image Understanding course at Brown Engineering (basic to intermediate level)
* Image Understanding course at Brown Engineering (basic to intermediate level)
Linha 67: Linha 44:
** 2011 course http://vision.lems.brown.edu/engn161/fall2011
** 2011 course http://vision.lems.brown.edu/engn161/fall2011
* Computer Vision course at Brown Computer Science (basic to intermediate) http://cs.brown.edu/courses/cs143
* Computer Vision course at Brown Computer Science (basic to intermediate) http://cs.brown.edu/courses/cs143
* [[PP|Curso de Computacao Paralela do]] [http://pt.wikipedia.org/wiki/IPRJ IPRJ]: incluindo CUDA
* [[SP|Stochastic processes course]] by prof. Fabbri
* [[ALN|Curso de Álgebra Linear Numérica do]] [http://pt.wikipedia.org/wiki/IPRJ IPRJ]: muitos conceitos úteis para visao computacional e computação gráfica
* [[PP|Parallel programming course]] by prof. Fabbri
* [[VC|Curso de Computacao Grafica do]] [http://pt.wikipedia.org/wiki/IPRJ IPRJ]
* [[ALN|Numerical linear algebra course]] by prof. Fabbri: many useful concepts for 3D computer vision and graphics
* [[PT|Curso de Teoria dos Padroes do]] [http://pt.wikipedia.org/wiki/IPRJ IPRJ]: aprendizagem de maquina, tecnicas estocasticas, vertente importante de Visao Computacional
* [[VC|Computer graphics course]] by prof. Fabbri
* [[OpenCV]], [[VXL]], [[SIP]]: bibliotecas de visao computacional em C/C++ e Scilab, respectivamente.
* [[PT|Pattern Theory course]] by prof. Fabbri: machine learning and relevant stochastic techniques for computer vision
* [[OpenCV]], [[VXL]], [[SIP]]: computer vision libraries in C/C++ and Scilab.


=== Provas ===
=== Exams ===
<b>P1:</b> 16Aug18 Signature / presence required for course
Final/Sub:


== Tarefas ==
== Homework ==
== Labs ==
=== Assignment 1 ===
=== Download dos Labs ===
* [[Media:L1-visao-2018.pdf|Exercise list for Soatto's book, chapter 2]]
Todo material dos trabalhos (imagens, pdfs, etc) pode ser baixado e atualizado pelo [[Git]] [https://github.com/rfabbri/vision-course]:
* Due date: entregar no inicio da P1
 
== Practice Labs ==
=== Download ===
All lab material (images, pdf, etc) can be downloaded and updated through [[Git]] [https://github.com/rfabbri/vision-course]:
   git clone https://github.com/rfabbri/vision-course.git
   git clone https://github.com/rfabbri/vision-course.git
* Use your language of choice to do the homework. I suggest Scilab, Matlab or Python. This year, most students have elected Python
* This year, since we'll be focusing on 3D, you only need to do Lab1 to get started:


=== Lab1: Processamento ponto a ponto ===
=== Lab1: Processamento ponto a ponto ===
* [https://github.com/rfabbri/vision-course/blob/master/lab1/lab1-visao-computacional-IPRJ-2013-v1.pdf?raw=true Enunciado]
* [https://github.com/rfabbri/vision-course/blob/master/lab1/lab1-visao-computacional-IPRJ-2013-v1.pdf?raw=true Enunciado]
* As imagens necessarias estao no repositorio [[Git]][https://github.com/rfabbri/vision-course/tree/master/lab1/figs]
* The necessary images are available at [[Git]][https://github.com/rfabbri/vision-course/tree/master/lab1/figs]
* Data de entrega: quarta-feira 13/Nov/2013 ate a meia-noite. Atrasos serao aceitos porem somente com reducao na nota.
* Due date: 9Ago18 until midnight. Atrasos serao aceitos porem somente com reducao na nota.


=== Lab2: Detecção de múltiplas retas e circulos ===
==== Instructions for submission ====
* [https://github.com/rfabbri/vision-course/blob/master/lab2/lab2-visao-computacional-IPRJ-2013-v1.pdf?raw=true Enunciado]
* As imagens necessarias estao no repositorio [[Git]][https://github.com/rfabbri/vision-course/tree/master/lab2/figs]
* Data de entrega: terca-feira 3/Dez/2013 ate a meia-noite. Atrasos serao aceitos porem somente com reducao na nota.
 
=== Instrucoes para Submissao ===
* A solucao deve ser <b>digitada</b> em um relatorio e entregue no formato <tt>pdf</tt>.
* A solucao deve ser <b>digitada</b> em um relatorio e entregue no formato <tt>pdf</tt>.
* Incluir tambem todo o codigo fonte e dados gerados
* Incluir tambem todo o codigo fonte e dados gerados
Linha 97: Linha 79:
  <sobrenome>-<nome>-visao-computacional-lab<numero_lab>.zip
  <sobrenome>-<nome>-visao-computacional-lab<numero_lab>.zip


== Projetos ==
== Projects ==
* ''Ideia 1'': selecionar alguma destas excelentes opcoes: http://vision.lems.brown.edu/content/engn-2560-projects
 
** Sugiro: 01, 04, dentre outros
=== 3D Reconstruction for Robotics ===
=== Toy Piano ===
Build an interactive toy piano or toy keyboard where you have colored objects representing each letter of
the alphabet, and the user would touch each object and the computer
would react to the persons' touch. To make this easy and cheap, this could
work with a [[PS3Eye|webcam]] and color detection, and the person/baby could use a black glove/sock.


https://lh4.googleusercontent.com/-F2NVFCF2sgc/TXzcfLA39XI/AAAAAAAAARs/ty1fbFrCSac/s400/letter+mat.jpg
A robot playing robot soccer needs to measure its position within the soccer field,
reconize and track the position of other robots and a ball object relative to itself and the soccer field.
If we only want to use regular and cheap cameras for this process (robot eyes), how are we to proceed with the software?  We could attempt to use sparse structure from motion techniques, that are now widely available.
This has the advantage of robustness for tracking relative pose and provides some
geometry, as opposed to generating and stitching depth maps.


=== Sistema de Seguranca para Construcao Civil ===
In this project, we will explore the use of sparse 3D point cloud technology as well as
Neste projeto o estudante ira desenvolver partes de um sistema de monitoramento de construcao civil utilizando uma camera de seguranca IP e uma camera barata [[PS3Eye]]. Numa construcao civil, seja individual ou publica, ha sempre artefatos a serem rastreados, tal como o numero aproximado de pessoas trabalhando, o numero de sacos de cimento e de outro material, bem como o proprio registro em video e 3D de cada etapa da obra desde o preparo do terrendo, a fundacao, e a edificacao. Para iniciar, o aluno exploraria sistemas existentes e em seguida desenvolveria um sistema em tempo real para o processamento das imagens - deteccao de intrusos durante a noite, deteccao de faces, uso de uma segunda camera para medir altura, deteccao de atividade (numero de pessoas ociosas, numero de pessoas trabalhando e em que).
curve-drawing technology for robotics. Our basic setting, to start with, is a
* Ver http://vision.lems.brown.edu/content/engn-2560-projects
robotic arm and a single webcam. The webcam reconstructs parts / points / curve
fragments of rigid objects from a video stream. It also determines its position
relative to the arm by looking at the arm moving in front of the camera and
reconstructing points or curves on it.  Finally, it expresses the 3D
information relative to the arm's coordinate system so that the arm can actuate
on the object (in robot soccer, this could be any action such as kicking a ball
or defending goal).


http://f.i.uol.com.br/folha/esporte/images/11256659.jpeg
The main thing we need is a module that takes in a video stream of a moving
object, and outputs a 3D reconstruction relative to the first frame. You can
think of an object being rotated in front of the camera, and a 3D model being
progressively built, as if the robot is manipulating the object to see all
angles and get a better picture of its structure.


=== Photometric Stereo Using Spherical Harmonic Representation of Lighting ===
This module of generating 3D from a video stream, while figuring out the relative position of the object at each frame, is a sparse 3D reconstruction and camera estimation system called structure from motion.


The goal of this project is to implement the photometric stereo approach developed by Basri et. al., which builds on their own general illumination model based on spherical harmonics. Any function defined on a sphere can be analyzed in the frequency domain, which is analogous to Fourier analysis with similar implications, such as the conclusion that any function defined on a sphere can be written as a linear combination of some harmonics (spherical harmonics in this case). Using this, the paper argues that general illumination can be thought of as a function on a sphere, with the scene being illuminated being at the center of this sphere. As long as the surfaces in the scene are Lambertian, only 9 spherical harmonic coefficients are needed to model a general illumination with over 99% accuracy.


This model can be used in many computer vision applications, such as object detection and recognition. Here we are primarily interested in the photometric stereo for general, unknown illumination. The student is to put together a photometric stereo system using this harmonic representation of lighting.
==== Part 1 ====
# To practice, [https://drive.google.com/open?id=1xfeO1wVeDUs0NoSsw8TT5Gwnp3cLxHo9 download the "Kermit the Frog" test dataset here] (copied from [http://www.cs.cornell.edu/~snavely/bundler/ Bundler]).
# Download [https://colmap.github.io COLMAP] 3D reconstruction and structure from motion software (use Linux), and also [Regard3D]
# Run a 3D reconstruction and visualize it.
## With COLMAP
### Go to Reconstruction -> Automatic Reconstruction
### Set the image folder as `kermit` (where the `.jpg` images are stored)
### Set the workspace folder as any folder (I create a folder `kermit-work` for that)
### Click "Run"
### Visualize the reconstruction. Go to Render -> Render options, and increase the point size
## Take some screenshots and put them on your report
## Play with some parameters
### Reconstruction -> Automatic Reconstruction -> Dense model (needs CUDA)
## With [http://www.regard3d.org/ Regard3D] (I've obtained nicer results more easily with Regard3D)
### Follow [http://www.regard3d.org/index.php/documentation/tutorial the tutorial]
## Generate and visualize a dense point cloud
## Export to meshlab and visualize the cameras and 3D reconstruction
# +1 Bonus point: generate your own dataset related to robot soccer and perform all steps above
# +1 Bonus point: Reconstruct/visualize the dataset using [http://ccwu.me/vsfm/ Visual SfM] or [https://www.blender.org/ Blender]


* [http://vision.lems.brown.edu/sites/vision.lems.brown.edu/files/04refPap.zip Reference papers]
==== Part 2 ====
* Project idea from [http://vision.lems.brown.edu/content/engn-2560-projects]
# Build LEMSVXD (copy the code from the professor)
# Compute curves for each view
# Visualize epipolar lines across many views
## Build GUI at `/Users/rfabbri/cprg/vxlprg/lemsvpe/lemsvxl/contrib/rfabbri/mw/app`
## Open the dataset on the GUI using the command `sg *`
## Start the Epipolar GUI tool
## Click on an image, and visualize the epipolar lines
## What do you think are the epipolar errors?
# Visualize the 3D reconstructed curves
# Perform 3D reconstruction by clicking on corresponding curves using the GUI
# Generate your own 3D curve-based reconstruction & visualize it


http://docs.enthought.com/mayavi/mayavi/_images/example_spherical_harmonics.jpg
==== More information ====
* Whoever completes all steps will score a 10
* Bonus: whoever runs on own dataset will have a bonus of 3 points over the maximum project grade. For example, an object rotating in front of a robot.
* Due date: 20Ago18 para os que nao forem fazer prova final, ou data da P1 para os que quiserem fazer prova final.


== Criterio de Avaliacao ==
=== Optional: Toy Piano ===
Build an interactive toy piano or toy keyboard where you have colored objects representing each letter of
the alphabet, and the user would touch each object and the computer
would react to the persons' touch. To make this easy and cheap, this could
work with a [[PS3Eye|webcam]] and color detection, and the person/baby could use a black glove/sock.
 
https://lh4.googleusercontent.com/-F2NVFCF2sgc/TXzcfLA39XI/AAAAAAAAARs/ty1fbFrCSac/s400/letter+mat.jpg


== Evaluation Criteria ==
* Grande = 70% projects and 30% tests
* '''Bonus:''' top 2 projetos que atingirem nivel de excelencia ganharao +3 pontos na media


[[Category:IPRJ]] [[Category:Lab Macambira]]
[[Category:IPRJ]] [[Category:Lab Macambira]]

Edição atual tal como às 09h15min de 6 de agosto de 2018

This is the homepage for the computer vision course at the undergraduate level being taught in 2018/2 at IPRJ/UERJ. This year will focus on programming 3D technology useful for robotics, motion capture and augmented reality applications such as Kinnect, Hololens, 3D scanning in Google streetview, and Glass.

  • Links to previous years of the course: 2013
Hologirl.jpeg

General Information

  • Instructor: prof. Ricardo Fabbri, Ph.D. Brown University
  • Period: Semester 2018/1 (ending August 2018), for Computer Engineering students in their 10th semester
  • Tuesdays and Fridays, 3:10pm-5:00pm, room 207
  • Forum for file exchange and discussion: google groups: iprj-visao-2018@googlegroups.com
  • Chat: IRC #labmacambira for random chat

Requisites

The student should know basic programming concepts and quickly learn languages on demand. This year's focus on 3D puts demands some C++ programming, which will be reviewed, but familiarity with C is required. Prior experience with linear algebra and vector calculus are highly recommended, and will also be reviewed as needed.

Software

This year will focus on practical C++ programming, since both performance and scalability are needed for the demands of 3D programming. Auxiliary languages will be used:

  • Scripting languages such as Python, Scilab or Matlab for experimentation and prototyping small ideas
  • Pd (pure data) will be used for real-time interactive apps (although the underlying code is C/C++)

Bibliography

Auxiliary books

SzeliskiBookFrontCover.png

  • Shape Classification and Analysis, 2nd. Ed., Luciano da Fontoura Costa & Roberto Marcondes Cesar Jr.

Links

3D Vision Courses

Finding a 3D vision course suitable for undergraduates is hard, just as the subject itself tends to be hard

  • Super Cool 3D course by Frank Dellaert. This is the same type of course as ours, except that we focused on getting structure from motion to work for most problems.
  • 3D Photography Course at Brown University by Gabriel Taubin. This vision course is similar to the present course, except it work later into the 3D vision pipeline, after we already have the 3D cameras and an initial reconstruction, how to get better reconstructions and photometry.

Other Vision & Related Courses

Exams

P1: 16Aug18 Signature / presence required for course Final/Sub:

Homework

Assignment 1

Practice Labs

Download

All lab material (images, pdf, etc) can be downloaded and updated through Git [1]:

 git clone https://github.com/rfabbri/vision-course.git
  • Use your language of choice to do the homework. I suggest Scilab, Matlab or Python. This year, most students have elected Python
  • This year, since we'll be focusing on 3D, you only need to do Lab1 to get started:

Lab1: Processamento ponto a ponto

  • Enunciado
  • The necessary images are available at Git[2]
  • Due date: 9Ago18 until midnight. Atrasos serao aceitos porem somente com reducao na nota.

Instructions for submission

  • A solucao deve ser digitada em um relatorio e entregue no formato pdf.
  • Incluir tambem todo o codigo fonte e dados gerados
  • Enviar um arquivo zipado com tudo (scripts scilab, relatorio, etc) por email, no formato:
<sobrenome>-<nome>-visao-computacional-lab<numero_lab>.zip

Projects

3D Reconstruction for Robotics

A robot playing robot soccer needs to measure its position within the soccer field, reconize and track the position of other robots and a ball object relative to itself and the soccer field. If we only want to use regular and cheap cameras for this process (robot eyes), how are we to proceed with the software? We could attempt to use sparse structure from motion techniques, that are now widely available. This has the advantage of robustness for tracking relative pose and provides some geometry, as opposed to generating and stitching depth maps.

In this project, we will explore the use of sparse 3D point cloud technology as well as curve-drawing technology for robotics. Our basic setting, to start with, is a robotic arm and a single webcam. The webcam reconstructs parts / points / curve fragments of rigid objects from a video stream. It also determines its position relative to the arm by looking at the arm moving in front of the camera and reconstructing points or curves on it. Finally, it expresses the 3D information relative to the arm's coordinate system so that the arm can actuate on the object (in robot soccer, this could be any action such as kicking a ball or defending goal).

The main thing we need is a module that takes in a video stream of a moving object, and outputs a 3D reconstruction relative to the first frame. You can think of an object being rotated in front of the camera, and a 3D model being progressively built, as if the robot is manipulating the object to see all angles and get a better picture of its structure.

This module of generating 3D from a video stream, while figuring out the relative position of the object at each frame, is a sparse 3D reconstruction and camera estimation system called structure from motion.


Part 1

  1. To practice, download the "Kermit the Frog" test dataset here (copied from Bundler).
  2. Download COLMAP 3D reconstruction and structure from motion software (use Linux), and also [Regard3D]
  3. Run a 3D reconstruction and visualize it.
    1. With COLMAP
      1. Go to Reconstruction -> Automatic Reconstruction
      2. Set the image folder as `kermit` (where the `.jpg` images are stored)
      3. Set the workspace folder as any folder (I create a folder `kermit-work` for that)
      4. Click "Run"
      5. Visualize the reconstruction. Go to Render -> Render options, and increase the point size
    2. Take some screenshots and put them on your report
    3. Play with some parameters
      1. Reconstruction -> Automatic Reconstruction -> Dense model (needs CUDA)
    4. With Regard3D (I've obtained nicer results more easily with Regard3D)
      1. Follow the tutorial
    5. Generate and visualize a dense point cloud
    6. Export to meshlab and visualize the cameras and 3D reconstruction
  4. +1 Bonus point: generate your own dataset related to robot soccer and perform all steps above
  5. +1 Bonus point: Reconstruct/visualize the dataset using Visual SfM or Blender

Part 2

  1. Build LEMSVXD (copy the code from the professor)
  2. Compute curves for each view
  3. Visualize epipolar lines across many views
    1. Build GUI at `/Users/rfabbri/cprg/vxlprg/lemsvpe/lemsvxl/contrib/rfabbri/mw/app`
    2. Open the dataset on the GUI using the command `sg *`
    3. Start the Epipolar GUI tool
    4. Click on an image, and visualize the epipolar lines
    5. What do you think are the epipolar errors?
  4. Visualize the 3D reconstructed curves
  5. Perform 3D reconstruction by clicking on corresponding curves using the GUI
  6. Generate your own 3D curve-based reconstruction & visualize it

More information

  • Whoever completes all steps will score a 10
  • Bonus: whoever runs on own dataset will have a bonus of 3 points over the maximum project grade. For example, an object rotating in front of a robot.
  • Due date: 20Ago18 para os que nao forem fazer prova final, ou data da P1 para os que quiserem fazer prova final.

Optional: Toy Piano

Build an interactive toy piano or toy keyboard where you have colored objects representing each letter of the alphabet, and the user would touch each object and the computer would react to the persons' touch. To make this easy and cheap, this could work with a webcam and color detection, and the person/baby could use a black glove/sock.

letter+mat.jpg

Evaluation Criteria

  • Grande = 70% projects and 30% tests
  • Bonus: top 2 projetos que atingirem nivel de excelencia ganharao +3 pontos na media