I want to get one process branch prediction data in centos7. So I use "perf stat -B -e branches,branch-misses ./a.out" to test for the code below.
#include <algorithm>
#include <vector>
#include <iostream>
int main()
{
// generate data
const size_t arraySize = 32768;
std::vector<int> data(arraySize);
for (unsigned c = 0; c < arraySize; ++c)
data[c] = std::rand() % 256;
// If the data are sorted like shown here the program runs about
// 6x faster (on my test machine, with -O2)
// std::sort(data.begin(), data.end());
long long sum = 0;
for (unsigned i = 0; i < 100000; ++i)
{
for (unsigned c = 0; c < arraySize; ++c)
{
if (data[c] >= 128)
sum += data[c];
}
}
std::cout << "sum = " << sum << std::endl;
}
the code come from another question: "Why is processing a sorted array faster than processing an unsorted array?"
But the result I get is always zero:
I think it's impossible for branch and branch prediction miss is zero. Can someone help me point out the cause? Thanks.