Use this tag to ask questions about Roofline Performance Model, which is a visually intuitive method, providing insights on high performance software efficiency, bottlenecks and optimization benefits. Roofline model combines Hardware compute and memory peak values with Software memory, caches and compute throughput measured. Roofline model was formalized at Berkeley in 2008 and can be generated using Intel Advisor or Berkeley ERT tools.
Questions tagged [roofline]
14 questions
9
votes
1 answer
Roofline model: calculating operational intensity
Say I have a toy loop like this
float x[N];
float y[N];
for (int i = 1; i < N-1; i++)
y[i] = a*(x[i-1] - x[i] + x[i+1])
And I assume my cache line is 64 Byte (i.e. big enough). Then I will have (per frame) basically 2 accesses to the RAM and 3…

Armen Avetisyan
- 1,140
- 10
- 29
2
votes
1 answer
Roofline model - how to calculate flop/byte ratio?
I would like to create roofline model and i have problem with algorithm flop per byte ratio. Can You explain me how to calculate it? The algorithm do computation using 5-point stencil.
Here's algorithm
for(int i=1; i

JudgeDeath
- 151
- 1
- 2
- 9
1
vote
1 answer
Roofline Model with CUDA Manual vs. Nsight Compute
I have a very simple vector addition kernel written for CUDA.
I want to calculate the arithmetic intensity as well as GFLOP/s for this Kernel.
The values I calculate differ visibly from the values obtained by Nsight Compute's Roofline Analysis…

Cherry Toska
- 131
- 8
1
vote
1 answer
One point is outside of the area of Roofline model
I used roofline model for analysis of code optimization.
But I found the point with green color is out of the area of boundary of bandwidth.The program can run without problem.
I don't understand why the green point is not in the area of red…

Lance
- 39
- 9
1
vote
1 answer
The optimization approach for roofline model
I have some questions about roofline model about how to deal with the point which is in memory bound.
Questions:
1)If the I0 derived from I0.BW=Peak is 1.21,and the actual I1 is 0.71,whether does it mean the actual I1 lies in memory bound?
2)If I1…

Lance
- 39
- 9
1
vote
0 answers
Intel Advisor's bandwidth information
While using Intel Advisor's roofline analysis view, we are presented data-bandwidth information for the different data-paths of the system i.e. DRAM, L3-, L2- and L1 caches. The program claims that it measures the bandwidths on the provided hardware…

Nitin Malapally
- 534
- 2
- 10
1
vote
0 answers
Advice/Guidance on Roofline Model Analysis (Skylake, Thunder X2, Haswell)
I'm learning bandwidth/memory- and CPU-bound performance and roofline graphs at the moment, and I'd love some help/input on how to analyze the following figure.
Roofline figure from "https://www.mdpi.com/2079-3197/8/1/20"
The first analysis I'm…

Forrest
- 11
- 3
1
vote
1 answer
NSIGHT compute: SOL SM versus Roofline
I ran cuda-11.2 nsight-compute on my cuda kernel.
It reports that SOL SM is at 79.44% which I interpret as being pretty close to maximum. SOL L1 is at 48.38%
When I examine the Roofline chart, I see that my measured result is very far away from peak…

Bram
- 7,440
- 3
- 52
- 94
1
vote
1 answer
Intel Advisor: Inspect method including all submethods
Using Intel Advisor and the roof line model, I would like to assess the performance of a certain function. This function uses the Eigen library for matrix operations, where the main part of work is done.
In the output I can see my function with a…

carlosvalderrama
- 465
- 1
- 6
- 22
1
vote
1 answer
Report FLOPs with Intel Advisor XE
I am usign the Intel Advisor 2018 (build 523188) on Linux CentOS 7.4 to profile a collection of benchmarks (I want to plot them all in a single Roofline plot) and I am using the command line tool advixe-cl to collect the survey, tripcounts and flops…

K. Iliakis
- 13
- 5
0
votes
0 answers
How does intel advisor measure L1, L2 and L3 bandwidths for loops and functions? Are there PMU events which count the bytes transferred?
I am using the intel advisor cache aware roofline feature and wanted to know how intel advisor measures the Core to L1 data cache bandwidth of the application.
The application is run twice, once for collecting timing information for loops and…

sham1810
- 173
- 1
- 11
0
votes
0 answers
How to calculate Arithmetic Intensity?
I have the following code snippet, of which I have to calculate the Arithmetic Intensity.
const int N = 8192;
float a[N], b[N], c[N], d[N];
...
#pragma omp parallel for simd
for(int i = 0; i < N; i++)
{
const float tmp_a = a[i];
const float tmp_b…

Jeet
- 359
- 1
- 6
- 24
0
votes
1 answer
Question about bandwidth ceilings in roofline models
I don't quite understand the bandwidth factor in roofline models described in Wikipedia (like the pic and its caption shown below):
why the intersection between the β x I and axises could be changed? Why could there be performance while operation…

gasoon
- 775
- 4
- 8
- 14
0
votes
1 answer
Roofline model: How does increasing Arithmetic Intensity allow room for improvements to performance?
Intel Tip: If you can’t break a memory roof, try to rework your algorithm for
higher arithmetic intensity. This will move you to the right and give
you more room to increase performance before hitting the memory
bandwidth roof.
For algorithms in…

Cibin Joseph
- 1,173
- 1
- 11
- 16