Commit b98a69e1 authored by Xavier Besseron's avatar Xavier Besseron
Browse files

PS 2021

parents
[[_TOC_]]
# Debugging and Performance Engineering
**Lecture**
- [Slides of the lecture](slides.pdf)
- Don't hesitate to refer to the slides to complete the execises
**Practical Session**
- Instructions: https://gitlab.uni.lu/SC-Camp/2021/debugging-and-profiling
- First, check the part *0 - Pre-requisites*
- If you're not familiar with GDB or Vagrind, go through part *1 - GDB Tutorial* and/or part *2 - Valgrind Tutorial*
- Exercises 3, 4 and 5 are independent and can be done in any order
- Solutions and explanations will be pushed in the repository at the end of the day
<br/><br/><br/><br/>
## 0 - Pre-requisites
For this practical session, we will use the GUANE cluster.
Unless indicated otherwise, you should connect to a computing node for the tutorials and exercises.
### Connect to a computing node of the GUANE cluster
Before connectin The first step is the reservation of a resource. Connect to the cluster frontend
Let's access the GUANE access node via toctoc:
```
[username@laptop ~]$ ssh username@167.249.40.26
[username@toctoc ~]$ ssh guane
```
Start an interactive session with 1 task and 4 cores:
```
[username@guane ~]$ srun -n 1 -c 4 --time 8:0:0 --pty bash
```
### Download the practical session materials
On the cluster, run the following command to download all the exercises:
```
git clone https://gitlab.uni.lu/SC-Camp/2021/debugging-and-profiling.git
```
Alternatively, if the download is too slow, you can make a copy from a local directory on GUANE and update to the latest version:
```
cp -r /home/xbesseron/debugging-and-profiling .
cd debugging-and-profiling
git pull
```
<br/><br/><br/><br/>
## 1 - GDB Tutorial
### Objective
- Learn the basic commands of GDB
### Instructions
1. Read carefully the page [A GDB Tutorial with Examples](http://www.cprogramming.com/gdb.html)
2. Follow and run the step-by-step example *An Example Debugging Session*
### Notes
- The example program `main.cpp` is available in the `tutorial_gdb` directory.
- The example program is waiting for you to enter a number as input. If the program or the GDB session appears to be stuck, enter a number (eg `3`) and press `Enter`.
### More help on GDB
To get help about the GDB commands:
- in GDB prompt, use `help` or `help <command>`
- in the shell, use `man gdb`
- online [GDB documentation](https://sourceware.org/gdb/current/onlinedocs/gdb/)
<br/><br/><br/><br/>
## 2 - Valgrind Tutorial
### Objective
- Run Valgrind on simple example and understand the error messages
### Setup
To use Valgrind on GUANE, we need to load the module
```
# search for Valgrind
module avail valgrind
# load the module
module load valgrind/3.15.0
```
### Instructions
1. Read carefully the page [Using Valgrind to Find Memory Leaks and Invalid Memory Use](http://www.cprogramming.com/debugging/valgrind.html)
2. Reproduce the execution and the analyses of the tutorials
3. Check the documentation and explanations about the reported errors
### Notes
- If you successfully loaded the Valgrind module as described above, you can skip the first *Getting Valgrind* part.
- The example programs of the tutorial are available in the `tutorial_valgrind` directory.
- Don't forget to compile the example programs: you can just type `make` for that.
- If you get the error `example1: command not found`, just use `./example1` instead.
### More help on Valgrind
- For more info on the command line options of Valgrind, use `man valgrind`
- [Valgrind User Manual](https://valgrind.org/docs/manual/manual.html)
- [Documentation of the Memcheck tool](https://valgrind.org/docs/manual/mc-manual.html)
- [Explanation of error messages from Memcheck](https://valgrind.org/docs/manual/mc-manual.html#mc-manual.errormsgs)
<br/><br/><br/><br/>
## 3 - Profiling with Callgrind
### Objective
- Profile a program with Valgrind and optimize it
### Setup
We will use Valgrind.
```
# Remove any previsouly loaded module
module purge
# Load Valgrind
module load valgrind/3.15.0
```
### Instructions
1. Compile and run the program
A example program is available in the directory `profiling`.
Let's compile it and run it.
```
# Compile the program
cd profiling
make
# Test the program
time ./main
```
It is a bit slow to execute. Can we optimize it?
2. Profile with Valgrind
Let's use Valgrind to profile it:
```
valgrind --tool=callgrind ./main
```
The profiling with Valgrind is slow (about 20-25x slower than the original) and should last around 2 minutes on GUANE for this example.
Valgrind will generate a trace file named `callgrind.out.XXXXX`.
One of the best way to look at it is to use **KCacheGrind**.
This tool is not installed on GUANE, but you can download the tracefile and visualize it on your laptop.
4. Download the tracefile
Use `scp` (or any `sftp` client) to download the tracefile `callgrind.out.XXXXX` on your computer.
3. Install KCacheGrind
- To install Kcachegrind on Linux, use your package manager. For example on Ubuntu, run `sudo apt install kcachegrind`
- For Kcachegrind on Windows, you have to install [QCacheGrind](https://sourceforge.net/projects/qcachegrindwin/) and [Visual C++ Redistributable for Visual Studio 2012 Update 4](https://www.microsoft.com/en-us/download/details.aspx?id=30679).
5. Visualize the tracefile with KCacheGrind
Open the tracefile with KCacheGrind. You should obtain something similar to that.
![Visualization with KCacheGrind](profiling/callgrind.png)
You can also download the source files to visualize the source code in KCacheGrind.
6. Optimize the program
This program contains a beginner C++ mistake that makes it slow.
**Can you figure out what is wrong and improve the performance of the program?**
Tip:
<br/><br/><br/><br/>
## 4 - Bug Hunting
### Objective
- Encounter different types of bugs and experiments with various debugging tools
### Instructions
A list of programs demonstrating the different kind of bus are available in the `exercises` directory.
Try the different debugging tools on every example to see how they behave and find the bugs.
**Can you exterminate all the bugs?**
### Notes
- You can compile each program manually using `gcc` or `icc`. You are encouraged to try both to see how differently they behave. Example: `gcc program.c -o program`. Add any additional parameter you might need.
- To use `gcc` you need to load the module `devtools/gcc/9.2.0`
- To use `icc` you need to load the module `devtools/intel/oneAPI`
- The files are named according to the type of bug they trigger. Your can refer to the [slides of the lecture](slides.pdf) for help.
- Look at the comment at the beginning of each `.c` file for tips or specific compilation options.
<br/><br/><br/><br/>
## 5 - Roofline with Intel Advisor
This exercise compares 3 implementations of matrix multiplication:
- Naive algorithm
- Block algorithm
- using Eigen library
with 2 different set of compilation options:
- without vectorization instructions
- with vectorization instructions
### Objectives
- Compare the performance of different implementations of the same algorithm
- Use Intel Advisor for the Roofline Analysis
### Setup
To use the graphical interface of Intel Advisor, we need to enable the X-forwarding with the `-X` of SSH.
```
[username@laptop ~]$ ssh -X username@167.249.40.26
[username@toctoc ~]$ ssh -X guane
```
We use this trick to connect to the first node of the job using `ssh -X`
```
[username@guane ~]$ salloc -n 1 -c 4 --time=8:00:00 bash -c 'ssh -X $(scontrol show hostnames | head -n 1)'
```
Once connected to the computing node, load the required modules (Intel compiler, a recent GCC and Eigen).
```
module purge
module load devtools/gcc/9.2.0 devtools/intel/oneAPI libraries/eigen3/3.3.7
```
Load the Intel Advisor module
```
module load advisor/2021.4.0
```
**Note:**
- If using Windows, you need to install a X server, for exemple with [MobaXterm](https://mobaxterm.mobatek.net/).
### Instructions
1. Compile and run the program
A example program is available in the directory `roofline`.
Let's compile it and run it.
```
# Compile the program
cd roofline
make
```
Two executables are compiled:
- `matmul_all_novec` without SIMD instructions
- `matmul_all_vec` with SIMD instructions
Run the two executables and compare the performance:
```
./matmul_all_novec
```
```
./matmul_all_vec
```
2. Start Intel Advisor
Start the GUI
```
advisor-gui &
```
The Intel Advisor interface will appear after some time (the connection might be slow).
3. Profile one executable
Run the roofline analysis in the Intel Advisor GUI
- Do *Create Project*, set the project name
- Select one of the executable above as the *Application* and click *OK*
- Select *CPU / Memory Roofline Insights* and click *Choose*
![Select analysis in Intel Advisor](roofline/advisor_select_analysis.png)
- Click the *Play* (triangle) / *Start Survey* button
![Roofline in Intel Advisor](roofline/advisor.png)
The analysis takes a bit of time. Advisor will run the program twice, once to collect the performance, once to collect the number of data accesses and floating-point operations.
4. Explore the plot
- Identify the *roof* lines of the plot:
- Maximum floating-point operation (FLOP) for scalar/vectorized instructions?
- Maximum bandwidth for RAM and cache accesses?
- Identify the different loops for the difference algorithms: not that easy :-)
- Theoretical comparison of the *naive* matrix multiplication:
- How much data is accesed by the algorithm? (read and write)
- How many floating-point operations are performed?
- What is the arithmetic intensity? Does it match the one found by Intel Advisor?
- What appears to the bottleneck for this algorithm on this machine?
#include <stdio.h>
// This program lists the arguments it has been called with.
//
// Examples of usage:
//
// ./01-logic_syntax_bugs
// ./01-logic_syntax_bugs param1
// ./01-logic_syntax_bugs param1 2
// ./01-logic_syntax_bugs param1 2 testParam
//
int main(int argc, char** argv)
{
// number of parameters
int nb_params = argc - 1;
// first print a message
if ( nb_params > 1 )
{
printf("This program was called with %i parameters\n", nb_params);
}
else if ( nb_params = 1 )
{
printf("This program was called with only 1 parameter\n");
}
else
{
printf("This program was called without any parameter\n");
}
// print program name and all parameters
printf("program = '%s'\n", argv[0] );
int i;
for ( i = 0 ; i < nb_params ; i++ )
{
printf("parameter %i = '%s'\n", i, argv[i] );
}
return 0;
}
#include <stdio.h>
#include <stdlib.h>
// This program computes factorial.
//
// Examples of usage:
//
// ./02-integer_overflow 4
// ./02-integer_overflow 10
// ./02-integer_overflow 20
//
int factorial(int n)
{
int result = 1;
int i;
for ( i = 2 ; i <= n ; i++ )
{
result *= i;
}
return result;
}
int main(int argc, char** argv)
{
// check number of parameters
if ( argc != 2)
{
printf("Error: exactly one parameter is required!\n");
return 1;
}
// get first parameter
int n = atoi(argv[1]);
int fact = factorial(n);
printf(" fact(%i) = %i\n", n, fact);
return 0;
}
#include <stdio.h>
#include <math.h>
// Notes:
//
// Add -lm on the compilation command line to link with the math library
// Add -Ddivbyzero -Dinvalidop or -Doverflow to compile only the relevant part
int main(int argc, char** argv)
{
#ifdef divbyzero
// Division by zero error
double a = 1.0 / 0.0;
printf("Division by zero: 1.0 / 0.0 = %e\n", a);
#endif
#ifdef invalidop
// Invalid operation
double b = sqrt(-1.0);
printf("Invalid operation: sqrt(-1.0) = %e\n", b);
#endif
#ifdef overflow
// Overflow
double c = exp( 1e30 );
printf("Overflow: exp( 1e30 ) = %e\n", c);
#endif
return 0;
}
#include <stdio.h>
#include <stdlib.h>
// Notes:
//
// Add -Dfailedalloc -Ddoublefree -Dfreenonalloc or -Ddoublealloc to compile only the relevant part
int main(int argc, char** argv)
{
#ifdef failedalloc
short SIZE = 1111;
double *array = malloc( sizeof(double) * -SIZE );
array[0] = 2.0;
free(array);
#endif
#ifdef doublefree
int *p = malloc( 2 * sizeof(int) );
free(p);
free(p);
#endif
#ifdef freenonalloc
double d[100];
free(d);
#endif
#ifdef doublealloc
void *p;
p = malloc( 100 );
p = malloc( 10 );
free(p);
#endif
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// This program will print all its parameters in uppercase
// Convert buffer to uppercase
char* uppercase( char* buffer )
{
int size = strlen(buffer);
char* buffer_up = malloc( size+1 );
int i;
for ( i = 0 ; i < size ; i++ )
{
if ( buffer[i] >= 'a' && buffer[i] <= 'z' )
buffer_up[i] = buffer[i] + 'A' - 'a';
else
buffer_up[i] = buffer[i];
}
buffer_up[size] = '\0';
return buffer_up;
}
int main(int argc, char** argv)
{
int i;
for( i = 0 ; i < argc ; i++ )
{
printf("%s ", uppercase(argv[i]) );
}
printf("\n");
return 0;
}
#include <stdio.h>
#include <stdlib.h>
// Notes:
//
// Add -Duninitstatic -Duninitdynamic or -Duninitnonalloc to compile only the relevant part
int main(int argc, char** argv)
{
#ifdef uninitstatic
double x,y;
x = y + 2;
printf("x = %f\n",x);
printf("y = %f\n",y);
#endif
#ifdef uninitdynamic
int size = 10;
double *array = malloc( sizeof(double) * size );
int i;
for( i = 1 ; i < size ; i++ )
array[i] = array[i-1];
for( i = 0 ; i < size ; i++ )
printf(" array[%i] = %f\n", i, array[i] );
free(array);
#endif
#ifdef uninitnonalloc
int size = 10;
double *array1, *array2 = malloc( sizeof(double) * size );
int i;
for( i = 0 ; i < size ; i++ )
array2[i] = array1[i];
for( i = 0 ; i < size ; i++ )
printf(" array2[%i] = %f\n", i, array2[i] );
free(array2);
#endif
return 0;
}
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv)
{
int size = 100;
double *fibo = malloc( sizeof(double) * size );
fibo[0] = 1; fibo[1] = 1;
int i;
for( i = 1 ; i < size ; i++ )
{
fibo[i] = fibo[i-1] + fibo[i-2];
}
printf(" fibo = %f\n", fibo[size] );
free(fibo);
return 0;
}
#include <stdio.h>
#include <stdlib.h>
int fibo(int n)
{
if (n == 1)
return 1;
else
return fibo(n-1) + fibo(n-2);