Visão Computacional

De Pontão Nós Digitais

This is the homepage for the computer vision course at the undergraduate level being taught in 2018/2 at IPRJ/UERJ. This year will focus on programming 3D technology useful for robotics, motion capture and augmented reality applications such as Kinnect, Hololens, 3D scanning in Google streetview, and Glass.

  • Links to previous years of the course: 2013

General Information

  • Instructor: prof. Ricardo Fabbri, Ph.D. Brown University
  • Period: Semester 2018/1 (ending August 2018), for Computer Engineering students in their 10th semester
  • Tuesdays and Fridays, 3:10pm-5:00pm, room 207
  • Forum for file exchange and discussion: google groups:
  • Chat: IRC #labmacambira for random chat


The student should know basic programming concepts and quickly learn languages on demand. This year's focus on 3D puts demands some C++ programming, which will be reviewed, but familiarity with C is required. Prior experience with linear algebra and vector calculus are highly recommended, and will also be reviewed as needed.


This year will focus on practical C++ programming, since both performance and scalability are needed for the demands of 3D programming. Auxiliary languages will be used:

  • Scripting languages such as Python, Scilab or Matlab for experimentation and prototyping small ideas
  • Pd (pure data) will be used for real-time interactive apps (although the underlying code is C/C++)


Auxiliary books


  • Shape Classification and Analysis, 2nd. Ed., Luciano da Fontoura Costa & Roberto Marcondes Cesar Jr.


3D Vision Courses

Finding a 3D vision course suitable for undergraduates is hard, just as the subject itself tends to be hard

  • Super Cool 3D course by Frank Dellaert. This is the same type of course as ours, except that we focused on getting structure from motion to work for most problems.
  • 3D Photography Course at Brown University by Gabriel Taubin. This vision course is similar to the present course, except it work later into the 3D vision pipeline, after we already have the 3D cameras and an initial reconstruction, how to get better reconstructions and photometry.

Other Vision & Related Courses


P1: 16Aug18 Signature / presence required for course Final/Sub:


Assignment 1

Practice Labs


All lab material (images, pdf, etc) can be downloaded and updated through Git [1]:

 git clone
  • Use your language of choice to do the homework. I suggest Scilab, Matlab or Python. This year, most students have elected Python
  • This year, since we'll be focusing on 3D, you only need to do Lab1 to get started:

Lab1: Processamento ponto a ponto

  • Enunciado
  • The necessary images are available at Git[2]
  • Due date: 9Ago18 until midnight. Atrasos serao aceitos porem somente com reducao na nota.

Instructions for submission

  • A solucao deve ser digitada em um relatorio e entregue no formato pdf.
  • Incluir tambem todo o codigo fonte e dados gerados
  • Enviar um arquivo zipado com tudo (scripts scilab, relatorio, etc) por email, no formato:


3D Reconstruction for Robotics

A robot playing robot soccer needs to measure its position within the soccer field, reconize and track the position of other robots and a ball object relative to itself and the soccer field. If we only want to use regular and cheap cameras for this process (robot eyes), how are we to proceed with the software? We could attempt to use sparse structure from motion techniques, that are now widely available. This has the advantage of robustness for tracking relative pose and provides some geometry, as opposed to generating and stitching depth maps.

In this project, we will explore the use of sparse 3D point cloud technology as well as curve-drawing technology for robotics. Our basic setting, to start with, is a robotic arm and a single webcam. The webcam reconstructs parts / points / curve fragments of rigid objects from a video stream. It also determines its position relative to the arm by looking at the arm moving in front of the camera and reconstructing points or curves on it. Finally, it expresses the 3D information relative to the arm's coordinate system so that the arm can actuate on the object (in robot soccer, this could be any action such as kicking a ball or defending goal).

The main thing we need is a module that takes in a video stream of a moving object, and outputs a 3D reconstruction relative to the first frame. You can think of an object being rotated in front of the camera, and a 3D model being progressively built, as if the robot is manipulating the object to see all angles and get a better picture of its structure.

This module of generating 3D from a video stream, while figuring out the relative position of the object at each frame, is a sparse 3D reconstruction and camera estimation system called structure from motion.

Part 1

  1. To practice, download the "Kermit the Frog" test dataset here (copied from Bundler).
  2. Download COLMAP 3D reconstruction and structure from motion software (use Linux), and also [Regard3D]
  3. Run a 3D reconstruction and visualize it.
    1. With COLMAP
      1. Go to Reconstruction -> Automatic Reconstruction
      2. Set the image folder as `kermit` (where the `.jpg` images are stored)
      3. Set the workspace folder as any folder (I create a folder `kermit-work` for that)
      4. Click "Run"
      5. Visualize the reconstruction. Go to Render -> Render options, and increase the point size
    2. Take some screenshots and put them on your report
    3. Play with some parameters
      1. Reconstruction -> Automatic Reconstruction -> Dense model (needs CUDA)
    4. With Regard3D (I've obtained nicer results more easily with Regard3D)
      1. Follow the tutorial
    5. Generate and visualize a dense point cloud
    6. Export to meshlab and visualize the cameras and 3D reconstruction
  4. +1 Bonus point: generate your own dataset related to robot soccer and perform all steps above
  5. +1 Bonus point: Reconstruct/visualize the dataset using Visual SfM or Blender

Part 2

  1. Build LEMSVXD (copy the code from the professor)
  2. Compute curves for each view
  3. Visualize epipolar lines across many views
    1. Build GUI at `/Users/rfabbri/cprg/vxlprg/lemsvpe/lemsvxl/contrib/rfabbri/mw/app`
    2. Open the dataset on the GUI using the command `sg *`
    3. Start the Epipolar GUI tool
    4. Click on an image, and visualize the epipolar lines
    5. What do you think are the epipolar errors?
  4. Visualize the 3D reconstructed curves
  5. Perform 3D reconstruction by clicking on corresponding curves using the GUI
  6. Generate your own 3D curve-based reconstruction & visualize it

More information

  • Whoever completes all steps will score a 10
  • Bonus: whoever runs on own dataset will have a bonus of 3 points over the maximum project grade. For example, an object rotating in front of a robot.
  • Due date: 20Ago18 para os que nao forem fazer prova final, ou data da P1 para os que quiserem fazer prova final.

Optional: Toy Piano

Build an interactive toy piano or toy keyboard where you have colored objects representing each letter of the alphabet, and the user would touch each object and the computer would react to the persons' touch. To make this easy and cheap, this could work with a webcam and color detection, and the person/baby could use a black glove/sock.


Evaluation Criteria

  • Grande = 70% projects and 30% tests
  • Bonus: top 2 projetos que atingirem nivel de excelencia ganharao +3 pontos na media