Alternatively, if the download is too slow, you can make a copy from a local directory on GUANE and update to the latest version:
```
cp -r /home/xbesseron/debugging-and-profiling .
cd debugging-and-profiling
git pull
```
<br/><br/><br/><br/>
## 1 - GDB Tutorial
### Objective
- Learn the basic commands of GDB
### Instructions
1. Read carefully the page [A GDB Tutorial with Examples](http://www.cprogramming.com/gdb.html)
2. Follow and run the step-by-step example *An Example Debugging Session*
### Notes
- The example program `main.cpp` is available in the `tutorial_gdb` directory.
- The example program is waiting for you to enter a number as input. If the program or the GDB session appears to be stuck, enter a number (eg `3`) and press `Enter`.
- Run Valgrind on simple example and understand the error messages
### Setup
To use Valgrind on GUANE, we need to load the module
```
# search for Valgrind
module avail valgrind
# load the module
module load valgrind/3.15.0
```
### Instructions
1. Read carefully the page [Using Valgrind to Find Memory Leaks and Invalid Memory Use](http://www.cprogramming.com/debugging/valgrind.html)
2. Reproduce the execution and the analyses of the tutorials
3. Check the documentation and explanations about the reported errors
### Notes
- If you successfully loaded the Valgrind module as described above, you can skip the first *Getting Valgrind* part.
- The example programs of the tutorial are available in the `tutorial_valgrind` directory.
- Don't forget to compile the example programs: you can just type `make` for that.
- If you get the error `example1: command not found`, just use `./example1` instead.
### More help on Valgrind
- For more info on the command line options of Valgrind, use `man valgrind`
-[Valgrind User Manual](https://valgrind.org/docs/manual/manual.html)
-[Documentation of the Memcheck tool](https://valgrind.org/docs/manual/mc-manual.html)
-[Explanation of error messages from Memcheck](https://valgrind.org/docs/manual/mc-manual.html#mc-manual.errormsgs)
<br/><br/><br/><br/>
## 3 - Profiling with Callgrind
### Objective
- Profile a program with Valgrind and optimize it
### Setup
We will use Valgrind.
```
# Remove any previsouly loaded module
module purge
# Load Valgrind
module load valgrind/3.15.0
```
### Instructions
1. Compile and run the program
A example program is available in the directory `profiling`.
Let's compile it and run it.
```
# Compile the program
cd profiling
make
# Test the program
time ./main
```
It is a bit slow to execute. Can we optimize it?
2. Profile with Valgrind
Let's use Valgrind to profile it:
```
valgrind --tool=callgrind ./main
```
The profiling with Valgrind is slow (about 20-25x slower than the original) and should last around 2 minutes on GUANE for this example.
Valgrind will generate a trace file named `callgrind.out.XXXXX`.
One of the best way to look at it is to use **KCacheGrind**.
This tool is not installed on GUANE, but you can download the tracefile and visualize it on your laptop.
4. Download the tracefile
Use `scp` (or any `sftp` client) to download the tracefile `callgrind.out.XXXXX` on your computer.
3. Install KCacheGrind
- To install Kcachegrind on Linux, use your package manager. For example on Ubuntu, run `sudo apt install kcachegrind`
- For Kcachegrind on Windows, you have to install [QCacheGrind](https://sourceforge.net/projects/qcachegrindwin/) and [Visual C++ Redistributable for Visual Studio 2012 Update 4](https://www.microsoft.com/en-us/download/details.aspx?id=30679).
5. Visualize the tracefile with KCacheGrind
Open the tracefile with KCacheGrind. You should obtain something similar to that.

You can also download the source files to visualize the source code in KCacheGrind.
6. Optimize the program
This program contains a beginner C++ mistake that makes it slow.
**Can you figure out what is wrong and improve the performance of the program?**
Tip:
<br/><br/><br/><br/>
## 4 - Bug Hunting
### Objective
- Encounter different types of bugs and experiments with various debugging tools
### Instructions
A list of programs demonstrating the different kind of bus are available in the `exercises` directory.
Try the different debugging tools on every example to see how they behave and find the bugs.
**Can you exterminate all the bugs?**
### Notes
- You can compile each program manually using `gcc` or `icc`. You are encouraged to try both to see how differently they behave. Example: `gcc program.c -o program`. Add any additional parameter you might need.
- To use `gcc` you need to load the module `devtools/gcc/9.2.0`
- To use `icc` you need to load the module `devtools/intel/oneAPI`
- The files are named according to the type of bug they trigger. Your can refer to the [slides of the lecture](slides.pdf) for help.
- Look at the comment at the beginning of each `.c` file for tips or specific compilation options.
<br/><br/><br/><br/>
## 5 - Roofline with Intel Advisor
This exercise compares 3 implementations of matrix multiplication:
- Naive algorithm
- Block algorithm
- using Eigen library
with 2 different set of compilation options:
- without vectorization instructions
- with vectorization instructions
### Objectives
- Compare the performance of different implementations of the same algorithm
- Use Intel Advisor for the Roofline Analysis
### Setup
To use the graphical interface of Intel Advisor, we need to enable the X-forwarding with the `-X` of SSH.
- If using Windows, you need to install a X server, for exemple with [MobaXterm](https://mobaxterm.mobatek.net/).
### Instructions
1. Compile and run the program
A example program is available in the directory `roofline`.
Let's compile it and run it.
```
# Compile the program
cd roofline
make
```
Two executables are compiled:
-`matmul_all_novec` without SIMD instructions
-`matmul_all_vec` with SIMD instructions
Run the two executables and compare the performance:
```
./matmul_all_novec
```
```
./matmul_all_vec
```
2. Start Intel Advisor
Start the GUI
```
advisor-gui &
```
The Intel Advisor interface will appear after some time (the connection might be slow).
3. Profile one executable
Run the roofline analysis in the Intel Advisor GUI
- Do *Create Project*, set the project name
- Select one of the executable above as the *Application* and click *OK*
- Select *CPU / Memory Roofline Insights* and click *Choose*

- Click the *Play* (triangle) / *Start Survey* button

The analysis takes a bit of time. Advisor will run the program twice, once to collect the performance, once to collect the number of data accesses and floating-point operations.
4. Explore the plot
- Identify the *roof* lines of the plot:
- Maximum floating-point operation (FLOP) for scalar/vectorized instructions?
- Maximum bandwidth for RAM and cache accesses?
- Identify the different loops for the difference algorithms: not that easy :-)
- Theoretical comparison of the *naive* matrix multiplication:
- How much data is accesed by the algorithm? (read and write)
- How many floating-point operations are performed?
- What is the arithmetic intensity? Does it match the one found by Intel Advisor?
- What appears to the bottleneck for this algorithm on this machine?