Introduction to Parallel Computing
This is the main page of a graduate-level course in parallel computing being taught in 2012/2 at the Polytechnic Institute IPRJ/UERJ. It is generally useful for programmers at the advanced level in the fields of scientific and multimedia programming.
General Info
- Instructor: prof. Ricardo Fabbri, Ph.D. Brown University
- Meeting times: Tues 12:30pm-2pm, Thursdays 2:20pm - 4pm
- Evaluation criteria: 1 quizz at the end of the term (60%), plus practical projects (40%).
- Forum for file exchange and discussion: uerj.tk
Pre-requisites
- Linux - intermediate to advanced (will be reviewed as needed) - read Recommended Reading
- C/C++ - intermediate to advanced (will be reviewed) - read Recommended Reading
- Basic understanding of computer architecture - read "Computer Systems: A Programmer's perspective" listed at Recommended Reading.
Approximate Content
The course focuses on software techniques for parallel computing. We are aiming at a comprehensive treatment on different types of practical parallel programming techniques
- process-oriented parallel programming
- thread programming/thread safety
- single-core vector instructions
- multi-processor and multi-core programming
- mapreduce/hadoop
- MPI
- Cuda/OpenCL
- Dataflow techniques
Each of the above techniques would merit a course of their own, as done in many of the best universities. Therefore we aim at attaining a practical familiarity with each, in the first half of the course, and we will specialize in the later part of the course as to help the students' graduate research.
Main Resources
- Textbooks
- 1st part of the course: "Is Parallel Programming Hard, and, if so, what can you do about it?" - Paul E. McKenney / IBM (editor).
- For MPI: "An Introduction to Parallel Programming" , by Peter Pacheco[1]
- For Cuda/OpenCL: Programming Massively Parallel Processors: A Hands-On Approach[2]
- General textbook: A. Grama, A. Gupta, G. Karypis, V. Kumar, Introduction to Parallel Computing, Second Edition, Addison-Wesley, 2003.
- Algorithms-oriented textbook: Algorithms: sequential, parallel, and distributed, Kenneth A. Berman, Jerome L. Paul
- For more theoretical info, see the chapter on parallel algorits in Cormen's classical algorithms book.
- Mapreduce, GFS, and Bigtable:
- Papers from Google
- http://code.google.com/edu/parallel/mapreduce-tutorial.html
- Wikipedia
- Hadoop documentation
- Presentations from IBM and Intel: Cilk, etc.
- Wikipedia pages
- Rice lecture notes on Parallel Computing [3]
- Other Resources
- http://www-users.cs.umn.edu/~karypis/parbook/
- http://www.cse.iitd.ernet.in/~subodh/courses/CSL860/
- Running Hadoop On Ubuntu Linux Multi and Single Node Cluster [4]
- Short course on CUDA at emc 2012 conference: http://josericardojunior.com/?p=219
Lectures
NOTE: There will be no lectures from Oct 4 - Oct 19 2012
Partial listing & Tentative Outline
- Overview of parallel computing: https://computing.llnl.gov/tutorials/parallel_comp/
- Review of Linux:
- See the book Running Linux from the Recommended Reading
- Review of C/C++ (finished 28/Aug)
- Fundamental programming techniques: processes and threads
- Read The Unix Programming Environment for some classic multi-process programming[5]
- What is a process: http://linuxgazette.net/133/saha.html
- 30/Aug: We'll be following the McKenney Book, ch 3 at the lab.
- Up to (and including) 18/Sept: We followed McKenney's code samples for counting algorithms and his Threading API in detail
- 20/Sept: Finished statistic counters and showed how to read assembly language generated by GCC, and how to disassemble binary objects
- 25/Sept: Finished theoretical thread content for the course: up until ch. 5 of McKenney's Book + Semaphores + Thread Safety.
- Homework 2 (see below)
- Cuda/OpenCL (02/Oct)
- Project 1 handout (see below) due Oct. 23
- Mapreduce/Hadoop
- MPI
Homework
Homework 1
Our insitute is planning on buying a supercomputer. Therefore, our first homework will consist on studying these systems in detail.
- The class shall be divided into groups of 2 people.
- Each group must pick one top 500 (http://www.top500.org) supercomputer and write on this wiki:
- Team number:
- Supercomputer:
- Each class will start with a 20min presentation from a random group (or a volunteer group).
- A handout must also be provided on Tue August 28
- Your project must not be a plain copy from wikipedia!
Contents of the presentation and report
- Overview of the system
- Glossary of terms
- How to program for the system
- Software infrastructure
- How does this differ from a conventional cluster.
Groups
- Team 1: Julio Stutz & Joel Sanchez
- Team 2: Claudir Oliver & Luiz Rosalba
- Team 3: Mateus Guida
- Team 4: Lyvia Aloquio & Stella Oggioni
- Supercomputer: Grifo04 (PETROBRAS/Brazil)
- Team 5: Ricardo Dias
- Team 6: Frederico Santos
- Supercomputer: Helios (International Fusion Energy Research Centre (IFERC), EU(F4E) - Japan Broader Approach collaboration) (report)
- Team 7: Thiago Luiz
- Supercomputer: Pleiades - NASA/Ames Research Center/ NAS - United States.
Homework 2: limit counters
Describe the limit counter implementations of sec. 4.3 in McKenney's Book, answering:
- list and explain the advantages of each counter
- explain the main concepts of each method in your words
Due date: Oct 1 (was: Sept 27)
Project 1
- The class will be divided into interest groups
- The interest groups will each pick one of the following technologies or else propose another one of their liking.
- mapreduce/hadoop
- MPI
- Cuda/OpenCL
- Hybrid technologies (Distributed + Cuda)
- Languages designed for concurrency: Google Go, Erlang, etc.
- Programming for the Playstation 3 parallel architecture[9]
- OpenGL/GLSL
- Google Pregel
- The project will consist in a series of presentations by the group members individually.
- Grade will be asigned to each presentation individually
Groups to Project 1
- MPI: Claudir Oliveira - 25/10/2012 (slides) (code)
- Hybrid technologies (Distributed + Cuda): Ricardo Dias (slides) (code)
- Cuda/OpenCL: Joel Sanchez (slides) (code)
- OPENGL/GLSL : Luiz Rosalba - 08/11/2012 (slides)
- mapreduce/hadoop : Julio Stutz (slides)
- Google Go : Mateus Guida (slides)
- Programming for the Playstation 3 parallel architecture: Thiago Luiz
Project 2
- This project will consist on a practical programming problem from the student's research. The student is required to describe a problem from his research and present a parallel implementation of some aspect of it
- This will be evaluated through actual code and accompanying monograph
Topic Suggestions
For those who haven't found a suitable research topic with their own advisors, here is a list of possible ideas. These ideas can have some of my guidance during the project, but please let your advisors know about this. Also, not all ideas may be very easy to develop a project on and may require some thinking on your part in terms of goals and objectives.
Try to email the original authors for the relevant papers for your project in order to get code. You have to understand the paper, understand the code, and run the code. You will also have to write code or a modification of the existing code, although you have limited time for this.
- "Design and Implementation of a Wide Area, Large-Scale Camera Network", Kuo et. al. 2010
- Technology: distributed computing / hadoop
- Abstract
We describe a wide area camera network on a campus setting, the SCALLOPSNet (Scalable Large Optical Sensor Network). It covers with about 100 stationary cameras an expansive area that can be divided into three distinct re- gions: inside a building, along urban paths, and in a re- mote natural reserve. Some of these regions lack connec- tions for power and communications, and, therefore, ne- cessitate wireless, battery-powered camera nodes. In our exploration of available solutions, we found existing smart cameras to be insufficient for this task, and instead designed our own battery-powered camera nodes that communicate using 802.11b. The camera network uses the Internet Pro- tocol on either wired or wireless networks to communi- cate with our central cluster, which runs cluster and cloud computing infrastructure. These frameworks like Apache Hadoop are well suited for large distributed and parallel tasks such as many computer vision algorithms. We discuss the design and implementation details of this network, to- gether with the challenges faced in deploying such a large scale network on a research campus. We plan to make the datasets available for researchers in the computer vision community in the near future.
- "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining", White, Yeh, Lin and Davis 2010
- Technology: Mapreduce
- Abstract: This work explores computer vision applications of the Map-
Reduce framework that are relevant to the data mining com- munity. An overview of MapReduce and common design patterns are provided for those with limited MapReduce background. We discuss both the high level theory and the low level implementation for several computer vision algo- rithms: classifier training, sliding windows, clustering, bag- of-features, background subtraction, and image registration. Experimental results for the k-means clustering and single Gaussian background subtraction algorithms are performed on a 410 node Hadoop cluster.
- "Building Rome on a Cloudless Day", Frahm et al, 2012
- Technology: Cuda
- Abstract:
This paper introduces an approach for dense 3D reconstruc- tion from unregistered Internet-scale photo collections with about 3 mil- lion of images within the span of a day on a single PC (“cloudless”). Our method advances image clustering, stereo, stereo fusion and structure from motion to achieve high computational performance. We leverage geometric and appearance constraints to obtain a highly parallel imple- mentation on modern graphics processors and multi-core architectures. This leads to two orders of magnitude higher performance on an order of magnitude larger dataset than competing state-of-the-art approaches.
Class exercies
- In-class assignment (extra .1 in the final test): Compare 3 different implementation of statistic counters: array of per-thread variables, eventuallly consistent counter and GCC __thread variables
- Only Julio Stutz scored the extra +0.1 on the final exam.
Keywords
Portuguese: Programação Paralela, Introdução à Computação Paralela