Parallel Programming Notes
(These notes concern the particulars of the parallel programming and execution environments installed on the SIRAF cluster. Since these are notes and not a tutorial or specifications document, the structure will be free form and subject to continual revisions.)
There are many parallel programming resources available on the Internet; a shortlist will appear at the end of this page. We will focus on specific applications available on SIRAF.
Because of the inherent difficulties in parallel programming, in the past many researchers would write serial programs and wait for faster CPUs to achieve increases in performance. However limitations in current semiconductor technologies have forced the major CPU vendors to release products with multiple computation cores instead of a single core at higher clock speeds. Currently Intel and AMD ship four core CPUs, with 8 cores planned for the end of 2009. For the foreseeable future, parallel programming will be the only avenue available to achieve higher levels of performance. The good news is that many scientific problems are highly parallel.
The simplest type of parallel programming is the executing the same program on different sets of data. Sun Grid Engine supports this type of parallelism via array jobs. See page 72 of the Sun Grid Engine 6 User Guide and the Simple Job Array Howto for more details. The job array facility in SGE requires less work - you won't need to manage dozens of jobs, just one - and more efficient: SGE can schedule and assign an array job more efficiently than a human being.
For more complicated types of parallelism, unless you have an interest in parallel algorithms and programming you are better off using parallel applications and libraries implemented by experienced programmers than writing your own. Writing correct parallel code is tricky; since instructions may execute in a different order each time the code is run, debugging problems is non-trivial. Here are some resources available for SIRAF for different types of parallelism.
Instruction Level Parallelism - usually you rely on compiler optimization to ferret out parallelism in your code. You can help the optimizer by writing code which is easy for it to analyze (ref.). The installed GNU compilers are adequate optimizing compilers, however greater performance may be possible by using the Intel, Portland Group or Pathscale commercial compilers.
For some data parallel algorithms such as FFTs and neural networks which do not require 64-bit precision. the instruction level parallelism of a GPU is far greater than that of a general purpose CPU.