I am trying to measure the # of computations performed in a C++ program (FLOPS). I am using a Broadwell-based CPU and not using GPU. I have tried the following command, which I included all the FP-related events I found.
perf stat -e fp_arith_inst_retired.128b_packed_double,fp_arith_inst_retired.128b_packed_single,fp_arith_inst_retired.256b_packed_double,fp_arith_inst_retired.256b_packed_single,fp_arith_inst_retired.double,fp_arith_inst_retired.packed,fp_arith_inst_retired.scalar,fp_arith_inst_retired.scalar_double,fp_arith_inst_retired.scalar_single,fp_arith_inst_retired.single,inst_retired.x87 ./test_exe
I got something as follows:
Performance counter stats for './test_exe':
0 fp_arith_inst_retired.128b_packed_double (36.36%)
0 fp_arith_inst_retired.128b_packed_single (36.36%)
0 fp_arith_inst_retired.256b_packed_double (36.37%)
0 fp_arith_inst_retired.256b_packed_single (36.37%)
4,520,439,602 fp_arith_inst_retired.double (36.37%)
0 fp_arith_inst_retired.packed (36.36%)
4,501,385,966 fp_arith_inst_retired.scalar (36.36%)
4,493,140,957 fp_arith_inst_retired.scalar_double (36.37%)
0 fp_arith_inst_retired.scalar_single (36.36%)
0 fp_arith_inst_retired.single (36.36%)
82,309,806 inst_retired.x87 (36.36%)
65.861043789 seconds time elapsed
65.692904000 seconds user
0.164997000 seconds sys
Questions:
- Although the C++ program is a large project, I did not use any SSE/AVX instructions. I am not familiar with SSE/AVX instruction set. The project is just written by the "ordinary" C++. Why does it contain many
fp_arith_inst_retired.double
,fp_arith_inst_retired.scalar
andfp_arith_inst_retired.scalar_double
? These counters are related to SSE/AVX computations, right? - What do the percentages in brackets mean? such as (36.37%)
- How can I compute the FLOPS in my C++ program based on the
perf
results?
Thanks.