I can see with the CPU profiler, that the compute_variances()
is the bottleneck of my project.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
75.63 5.43 5.43 40 135.75 135.75 compute_variances(unsigned int, std::vector<Point, std::allocator<Point> > const&, float*, float*, unsigned int*)
19.08 6.80 1.37 readDivisionSpace(Division_Euclidean_space&, char*)
...
Here is the body of the function:
void compute_variances(size_t t, const std::vector<Point>& points, float* avg,
float* var, size_t* split_dims) {
for (size_t d = 0; d < points[0].dim(); d++) {
avg[d] = 0.0;
var[d] = 0.0;
}
float delta, n;
for (size_t i = 0; i < points.size(); ++i) {
n = 1.0 + i;
for (size_t d = 0; d < points[0].dim(); ++d) {
delta = (points[i][d]) - avg[d];
avg[d] += delta / n;
var[d] += delta * ((points[i][d]) - avg[d]);
}
}
/* Find t dimensions with largest scaled variance. */
kthLargest(var, points[0].dim(), t, split_dims);
}
where kthLargest()
doesn't seem to be a problem, since I see that:
0.00 7.18 0.00 40 0.00 0.00 kthLargest(float*, int, int, unsigned int*)
The compute_variances()
takes a vector of vectors of floats (i.e. a vector of Points
, where Points
is a class I have implemented) and computes the variance of them, in each dimension (with regard to the algorithm of Knuth).
Here is how I call the function:
float avg[(*points)[0].dim()];
float var[(*points)[0].dim()];
size_t split_dims[t];
compute_variances(t, *points, avg, var, split_dims);
The question is, can I do better? I would really happy to pay the trade-off between speed and approximate computation of variances. Or maybe I could make the code more cache friendly or something?
I compiled like this:
g++ main_noTime.cpp -std=c++0x -p -pg -O3 -o eg
Notice, that before edit, I had used -o3
, not with a capital 'o'. Thanks to ypnos, I compiled now with the optimization flag -O3
. I am sure that there was a difference between them, since I performed time measurements with one of these methods in my pseudo-site.
Note that now, compute_variances
is dominating the overall project's time!
[EDIT]
copute_variances()
is called 40 times.
Per 10 calls, the following hold true:
points.size() = 1000 and points[0].dim = 10000
points.size() = 10000 and points[0].dim = 100
points.size() = 10000 and points[0].dim = 10000
points.size() = 100000 and points[0].dim = 100
Each call handles different data.
Q: How fast is access to points[i][d]
?
A: point[i]
is just the i-th element of std::vector, where the second []
, is implemented as this, in the Point
class.
const FT& operator [](const int i) const {
if (i < (int) coords.size() && i >= 0)
return coords.at(i);
else {
std::cout << "Error at Point::[]" << std::endl;
exit(1);
}
return coords[0]; // Clear -Wall warning
}
where coords
is a std::vector
of float
values. This seems a bit heavy, but shouldn't the compiler be smart enough to predict correctly that the branch is always true? (I mean after the cold start). Moreover, the std::vector.at()
is supposed to be constant time (as said in the ref). I changed this to have only .at()
in the body of the function and the time measurements remained, pretty much, the same.
The division in the compute_variances()
is for sure something heavy! However, Knuth's algorithm was a numerical stable one and I was not able to find another algorithm, that would de both numerical stable and without division.
Note that I am not interesting in parallelism right now.
[EDIT.2]
Minimal example of Point
class (I think I didn't forget to show something):
class Point {
public:
typedef float FT;
...
/**
* Get dimension of point.
*/
size_t dim() const {
return coords.size();
}
/**
* Operator that returns the coordinate at the given index.
* @param i - index of the coordinate
* @return the coordinate at index i
*/
FT& operator [](const int i) {
return coords.at(i);
//it's the same if I have the commented code below
/*if (i < (int) coords.size() && i >= 0)
return coords.at(i);
else {
std::cout << "Error at Point::[]" << std::endl;
exit(1);
}
return coords[0]; // Clear -Wall warning*/
}
/**
* Operator that returns the coordinate at the given index. (constant)
* @param i - index of the coordinate
* @return the coordinate at index i
*/
const FT& operator [](const int i) const {
return coords.at(i);
/*if (i < (int) coords.size() && i >= 0)
return coords.at(i);
else {
std::cout << "Error at Point::[]" << std::endl;
exit(1);
}
return coords[0]; // Clear -Wall warning*/
}
private:
std::vector<FT> coords;
};