Contents
Using SIRAF
<<add stuff here>>
Programming
SIRAF User Meetings
10-02-09 SIRAF Users Meeting
Topics: Introduction to Statistical Computing using R on SIRAF (R. Zur)
Using R on SIRAF
R is a programming language and software environment for data analysis. It is the de facto program for statisticians. Here are some reasons why R is used over Matlab:
- It is FREE.
- Packages can be added to it, also for FREE.
- Creates publication-quality graphs
- Can implement common statistical analysis easily
- C, C++, and Fortran code can be run in R
- Also can access R from Perl and Python
- Programs can be exported for use in LaTeX
There are versions of R for Windows, Mac, and Linux. The Windows version has a GUI, and the Linux version must all be done from command line.
R can function as:
- A calculator on command line: add, subtract, multiply, divide, pi, etc.
- Vectors: seq(), rep()
- Programming language: if/then, loops, etc.
To get help, type (for function called "anova"):
>> ?anova
>> help(anova)
>> example(anova)
For more help, visit: www.r-project.org.
R has a pretty good random number generator: rnorm, rlnorm, rbinorm, rpois. However, R is not very good at running things with loops in them. A suggestion is to write the code in C and make libraries that R can access. Lorenzo has a library for proproc in C.
Although parallel programming may exist for R, it sounds hard! You may use array jobs to run many programs in R. Since no license is required, you can run as many as you like without being restricted.
Questions:
What is the difference between R and Matlab?
- R is free, Matlab is not.
- The statistical package for Matlab would cost money, whereas the package for R would be free.
- Recent statistics programs are available on R. In fact, the people who invented them probably programmed it in R, so it's available right away and only through R.
- R does surface plots and contour plots, just like Matlab.
- Documentation is R is pretty good, with PDF files for each package, describing the commands with citations and usage. The documentation in R, however, requires that you know quite a bit about the statistics before you can use it. The Matlab documentation can sometimes teach you about the statistics when you read it.
Where can I get help?
- There's an online forum, but the people can be mean sometimes if you have no idea what you're doing.
- You can email the authors of the packages, and they may release errata or new packages with the update.
- If you sound like you know what you're doing, they'll be nice and explain stuff.
Adding packages to SIRAF?
- Can be done locally
- Or, email Chun-Wai to put it centrally on SIRAF.
Linux commands
We are making a list of Linux commands for people new to using command line. Please email any questions you have about Linux commands (or suggestions for ones to add to the handbook) to Beverly or Ingrid. Thanks!
08-07-09 SIRAF Users Meeting
Topics: Planning ahead - talks from SIRAF Users about why and how they're using the Cluster
Parallel Programming using CILK ++ (C. Chan)
Parallel Programming using CILK ++
Computers have been made faster not by increase processor speed, but mainly by adding more processors. This is why parallel programming is an ideal tool to speed up your program run time. Most programs are written for serial execution. Parallel execution can decrease the amount of time a program takes to run by many fold; however, parallel programming (using OpenMP or MPI) requires writing new code and verifying that new code. CILK ++ is a jacket for parallel programming, which allows the user to easily change serial code into parallel code. Although using CILK may not deliver optimum speed up, it will deliver speed up with very little time investment to change your existing code. About a 4-16x speed up can be expected from CILK, which is considered acceptable.
History
CILK development was started in 1994 at MIT. It was originally written for SGI IRIX, but it was later ported to Linux, Mac OS, and Windows, and it was renamed CILK ++ and commercialized. The commercial company is called CILK Arts. Academic licenses for CILK are free for use and distribution. CILK ++ is compatible with C/C++ libraries in both directions.
CILK
- Small set of keyword extensions to C/C++ to support shared memory parallelism
- Data structures called "hyper objects" that insure that you do not overwrite variables
- Thread scheduling and memory management
- Cilkscreen - to detect race conditions
- Cilkview - to determine the degree of parallelism and scaling in your serial code
Pthreads
- Like MPI, low level building block
- 60 functions, long arguments to pointers
- Look in /user/include/pthread.h
- Sequential code must be restructured and rewritten to include pthreads
- Thread management is up to user
OpenMP
- Takes sequential code and parallelize it without rewriting the whole code
- However, you must enter the variables that are shared and local
- Uses preprocessor #pragmas
- Whereas a CILK ++ program is guaranteed to compile as a sequential program, OpenMP programs may only be compiled as a parallel program
- Open MP 3.0 may support more parallelism; previous versions were for specific applications
- Open MP 3.0 has over 30 #pragmas, control and environment variables
- "Work stealing" and "Workers" - CILK will get good computation throughput by running parallel things when space is available. In other words, it's more efficient than OpenMP.
Convolution Example
- Single thread: 9.673 seconds
- OMP
>> export OMP_NUM_THREADS=16
>> time ./MyConv2-OMP
- 2.293 seconds
- CILK
- For examples: /Projects/SIRAF-ADM/users/cchan/cilk/examples/qsort
- If that doesn't work: /opt/cilk (?)
Important Notes
One should always check to make sure you get a sufficient speed up time before allocating CPUs to run your code. This is done using the following command:
>> cilkview filename
07-01-09: SIRAF Users Meeting
Topics:
GPU Programming for Matlab (J Bryant)
GPU Programming for Python and IDL (C Chan)
GPU Programming for Matlab GPU
Graphical processing unit, highly specialized functions specifically to do certain computing functions very quickly, for instance, those needed to display graphics. A lot are packed on to a single board. Can be very efficient. For linear processes.
Accelereyes - Jacket for Matlab, links Matlab interface to Cuda, NVIDIA libraries.
Add jacket engine path to Matlab path.
Basic linear subset examples: blas_example Runs on CPU and GPU and prints out the speed-up ratio.
GPU is 15x faster for simple matrix operations (for example, matrix multiply, rotate, add, FFT).
Exporting commands to GPU adds a large overhead in computation time.
gfor - parallel for loop restricted to GPU computing. If each for loop requires a lot of memory, then it's not worth it; that is, it will not speed up the computation). gfor is faster the second time you run it.
|
speed up time |
gfor(GPU) vs for(GPU) |
1.5x |
gfor(GPU) vs for(CPU |
8x |
If the matrices are small, then the overhead is large and GPU doesn't help over CPU. Exportation to GPU is important. As long as it's a decent-sized problem, GPU will give a speed-up.
Jacket costs: $100 one node for student, $500 one node academic non-student.
GPU for IDL and Python
GPU libraries
>> ssh username [at] siraf-login.bsd.uchicago.edu
>> cd /opt/gpulib
>> ls (tells you about how to get started)
>> cd IDL
>> cd lib (.pro are procedures)
gpuDiv.pro: has commands on how to program with it in IDL.
>> cd /opt/gpulib/MATLAB/lib
>> less gpuArray.m (Documentation is source code and comments, look at function comments and try it out.)
>> less gputest (testing of each operation and tells the speed-up)
Important note: MOST GPUs only handle single floating point operations. We are waiting for the GTX 64-bit floating point GPUs to be cheaper to buy.
Question: How can we check what kind of GPU card we have on our computers?
Answer: >> /sbin/lspci (tells video cards, ethernet controller, peripheral buses). There is no video card on the master node.
The Big nodes on the cluster have two NVIDIA cards, each one with 1GB of memory.
SUMMARY
GPU is good for:
- Large sized arrays
- Linear algebra problems
- Add/divide/multiply matrices
GPU is not good for:
- If/then statements
- Small matrices
- For loops with a lot of memory needed for each loop
06-03-09: SIRAF Users Meeting
Topics:
Parallel Programming (R Tomek)
Wiki Update (I Reiser)
PARALLEL PROGRAMMING
Parallel computing vs. parallel programming: parallel computing is taking one program and running many copies of it, each copy on a different processor. Parallel programming is taking a single program code and running parts of it on multiple processors. For instance, taking a for loop, and using 8 processors to reduce the total amount of time for the for loop. Note that this is not possible if each loop requires information from the previous loop.
Common libraries:
- Open MP (C, C++, Fortran) - for shared memory
- Open MPI (C, C++, Fortran, Python, Java) - for shared and distributed memory
Library |
Pros |
Cons |
Open MP |
Easier to code and debug, |
Shared memory only, |
Open MPI |
Each processor can have its |
Harder to code |
How to code Open MP for C++, type the following before a for loop: #pragma omp parallel for
How to code Open MP for Fortran, type: !$ OMP PARALLEL ... !$ OMP DO ... !$ OMP END PARALLEL
How to code Open MPI: MPI_init(&argc,&argv); MPI_comm_rank(MPI_COMM_WORLD...) etc. Also, need to make sure you define datatypes: MPI_char, MPI_int, etc.
Submitting to SIRAF: shm.pe - Shared memory parallel environment >> echo appname filename > ~/script && qsub -q all.q -pe shm 8 ~/script
dist.pe - Distributed memory parallel environment >> echo appname filename > ~/script && qsub -q all.q -pe dist 16 ~/script
For MATLAB: >> qrsh -q small.q -pe shm 4 -now n matlab
This asks for 4 threads to do the math on one of the small nodes, and it tells the program not to kill itself if it can't find enough processors to run using the requested number of threads. Instead, it will wait until enough processors are available to run it.
Useful websites:
SIRAF WIKI
- We should allow all users to update it.
- Each person needs to get a login name with password, so we can see who updates what.
- We should add the Parallel Programming powerpoint presentation by Rob to the Wiki.
05-20-09 SIRAF Users Meeting
Topics:
Queue submission (I Reiser)
Job monitoring (A Jamieson)
Processes and threads (C Chan)
QUEUE SUBMISSION
Logging in: -Log in to the master node. >>ssh username(at)siraf-login.bsd.uchicago.edu
Submitting jobs: Do not run jobs in the login node. Instead, use qlogin (interactive submissions) or qsub (batch submissions)
Deleting jobs: >> qdel JOB_ID
Submitting array jobs: Used to submit many jobs using the same shell script, but different input files.
JOB MONITORING
Ways to monitor jobs:
>> top
>> htop
>> qstat
>> qstat -f
>> qstat -f -u '*'
>> qmon&
Qmon - a program for monitoring the cluster. Two most heavily used features in QMON are Queue Control and Job Control, the two buttons at the upper left corner of the panel. These will tell you who is running jobs on the cluster and how much memory is being used in each node of the cluster.
PROCESSES AND THREADS
Concurrency - program can be divided into subtasks that can be run independently.
Parallelism - program can be divided into subtasks that can be run simultaneously.