Measuring code performance

Timing the execution of your code in Chapel

The code generated after Exercise “Basic.4” is the full implementation of our simulation. We will be using it as a benchmark, to see how much we can improve the performance with Chapel’s parallel programming features in the following lessons.

But first, we need a quantitative way to measure the performance of our code. Maybe the easiest way to do it, is to see how much it takes to finish a simulation. The UNIX command time could be used to this effect

$ time ./baseSolver --rows=650 --iout=200 --niter=10_000 --tolerance=0.002 --nout=1000
Temperature at iteration 0: 25.0
Temperature at iteration 1000: 25.0
Temperature at iteration 2000: 25.0
Temperature at iteration 3000: 25.0
Temperature at iteration 4000: 24.9996
Temperature at iteration 5000: 24.9968
Temperature at iteration 6000: 24.987
Temperature at iteration 7000: 24.9639
Final temperature at the desired position [200,200] after 7750 iterations is: 24.9343
The largest temperature difference was 0.00199985
real	0m3.931s
user	0m7.354s
sys	0m9.952s

The real time is what interest us. Our code is taking around 9.2 seconds from the moment it is called at the command line until it returns. Sometimes, however, it could be useful to take the execution time of specific parts of the code. This can be achieved by modifying the code to output the information that we need. This process is called instrumentation of the code.

An easy way to instrument our code with Chapel is by using the module Time. Modules in Chapel are libraries of useful functions and methods that can be used in our code once the module is loaded. To load a module we use the keyword use followed by the name of the module. Once the Time module is loaded we can create a variable of the type Timer, and use the methods start, stopand elapsed to instrument our code.

use Time;
var watch: Timer;
watch.start();
while (count < niter && delta >= tolerance) do {
  ...
}
watch.stop();
writeln('The simulation took ', watch.elapsed(), ' seconds');
$ chpl --fast baseSolver.chpl -o baseSolver
$ ./baseSolver --rows=650 --iout=200 --niter=10_000 --tolerance=0.002 --nout=1000
Temperature at iteration 0: 25.0
Temperature at iteration 1000: 25.0
Temperature at iteration 2000: 25.0
Temperature at iteration 3000: 25.0
Temperature at iteration 4000: 24.9996
Temperature at iteration 5000: 24.9968
Temperature at iteration 6000: 24.987
Temperature at iteration 7000: 24.9639
Final temperature at the desired position [200,200] after 7750 iterations is: 24.9343
The largest temperature difference was 0.00199985
The simulation took 3.9187 seconds

Exercise “Basic.5”

Try recompiling without --fast and see how it affects the execution time. If it becomes too slow, try reducing the problem size. What is the speedup factor with --fast?