Std::vector fill time goes from 0ms to 16ms after a certain threshold?

Question

Here is what I'm doing. My application takes points from the user while dragging and in real time displays a filled polygon.

It basically adds the mouse position on MouseMove. This point is a USERPOINT and has bezier handles because eventually I will do bezier and this is why I must transfer them into a vector.

So basically MousePos -> USERPOINT. USERPOINT gets added to a std::vector<USERPOINT> . Then in my UpdateShape() function, I do this:

DrawingPoints is defined like this:

std::vector<std::vector<GLdouble>> DrawingPoints;


Contour[i].DrawingPoints.clear();



 for(unsigned int x = 0; x < Contour[i].UserPoints.size() - 1; ++x)
         SetCubicBezier(
             Contour[i].UserPoints[x],
             Contour[i].UserPoints[x + 1],
             i);

SetCubicBezier() currently looks like this:

void OGLSHAPE::SetCubicBezier(USERFPOINT &a,USERFPOINT &b, int &currentcontour )
{
std::vector<GLdouble> temp(2);

    if(a.RightHandle.x == a.UserPoint.x && a.RightHandle.y == a.UserPoint.y 
        && b.LeftHandle.x == b.UserPoint.x && b.LeftHandle.y == b.UserPoint.y )
    {

        temp[0] = (GLdouble)a.UserPoint.x;
        temp[1] = (GLdouble)a.UserPoint.y;

        Contour[currentcontour].DrawingPoints.push_back(temp);

        temp[0] = (GLdouble)b.UserPoint.x;
        temp[1] = (GLdouble)b.UserPoint.y;


        Contour[currentcontour].DrawingPoints.push_back(temp);

    }
    else
    {
         //do cubic bezier calculation
        }

So for the reason of cubic bezier, I need to make USERPOINTS into GlDouble[2] (since GLUTesselator takes in a static array of double.

So I did some profiling. At ~ 100 points, the code:

 for(unsigned int x = 0; x < Contour[i].UserPoints.size() - 1; ++x)
         SetCubicBezier(
             Contour[i].UserPoints[x],
             Contour[i].UserPoints[x + 1],
             i);

Took 0 ms to execute. then around 120, it jumps to 16ms and never looks back. I'm positive this is due to std::vector. What can I do to make it stay at 0ms. I don't mind using lots of memory while generating the shape then removing the excess when the shape is finalized, or something like this.

Have you tried comparing the performance with manually dynamically allocating an array of similar size using `new`? — James McNellis, Jul 11 '10 at 19:50
@Jex: Yes, agreed, but it's always a good idea to measure the performance of several ways of accomplishing the task. If you are going to blame `vector` for certain performance characteristics, you should compare the performance against allocating the memory yourself and see how that performs; this can help you narrow down what the bottleneck might be. — James McNellis, Jul 11 '10 at 19:54
It looks like `temp` never goes above a size of 4. If this is so, an automatic array would be much faster. — GManNickG, Jul 11 '10 at 19:55
@Jex: Actually, `list` is arguably the worse container. :) It won't perform faster, unless you're inserting and removing things from the middle *often*. — GManNickG, Jul 11 '10 at 19:59
possible duplicate (comparing c array vs vector, especially the answers are very interesting): http://stackoverflow.com/questions/1945777/c-array-vs-vector — rubenvb, Jul 11 '10 at 20:01
@Jex: you keep asking question complaining about performance and blaming `std::vector`. Perhaps you need to profile and identify the actual cause (since `std::vector` is a wrapper around a native array so it's performance is pretty optimal for lots of cases). — Evan Teran, Jul 13 '10 at 02:28

score 12 · Accepted Answer · answered Jul 11 '10 at 19:53

12

0ms is no time...nothing executes in no time. This should be your first indicator that you might want to check your timing methods over timing results.

Namely, timers typically don't have good resolution. Your pre-16ms results are probably just actually 1ms - 15ms being incorrectly reported at 0ms. In any case, if we could tell you how to keep it at 0ms, we'd be rich and famous.

Instead, find out which parts of the loop take the longest, and optimize those. Don't work towards an arbitrary time measure. I'd recommend getting a good profiler to get accurate results. Then you don't need to guess what's slow (something in the loop), but can actually see what part is slow.

answered Jul 11 '10 at 19:53

GManNickG

494,350
52
494
543

@Jex: Or your algorithm causing too much allocation. There's more than one way to fix these things. I'm going to make my answer CW since it's admittedly more about timing and less about optimizing. – GManNickG Jul 11 '10 at 19:56
7

"if we could tell you how to keep it at 0ms, we'd be rich and famous" - liked that very much. – Jul 11 '10 at 19:56
1

@Jex Or you not knowing what you are dong - could that be even vaguely possible? Searching for something other than themselves to blame is not what a good craftsman does. – Jul 11 '10 at 19:58
4

The Windows timer (used by `GetTickCount`) has a resolution of only 15-16 ms, which would explain the results. – dan04 Jul 11 '10 at 20:28

score 1 · Answer 2 · answered Jul 11 '10 at 20:09

1

You could use vector::reserve() to avoid unnecessary reallocations in DrawingPoints:

Contour[i].DrawingPoints.reserve(Contour[i].size());    
for(unsigned int x = 0; x < Contour[i].UserPoints.size() - 1; ++x) {
   ...
}

answered Jul 11 '10 at 20:09

sth

222,467
53
283
367

This did not help :-( its still slow – jmasterx Jul 11 '10 at 20:14

Frunsi · Answer 3 · 2010-07-11T20:21:17.773

1

If you actually timed the second code snippet only (as you stated in your post), then you're probably just reading from the vector. This means, the cause can not be the re-allocation cost of the vector. In that case, it may due to cache issues of the CPU (i.e. the small datasets can be read in lightning speed from cpu cache, but whenever the dataset is larger than the cache [or when alternately reading from different memory locations], the cpu has to access ram, which is distinctly slower than cache access).

If the part of the code, which you profiled, appends data to the vector, then use std::vector::reserve() with an appropriate capacity (number of expected entries in vector) before filling it.

However, regard two general rules for profiling/benchmarking:

1) Use time measurement methods with high resolution precision (as others stated, the resolution of your timer IS too low)

2) In any case, run the code snippet more than once (e.g. 100 times), get the total time of all runs and divide it by number of runs. This will give you some REAL numbers.

edited Jul 11 '10 at 20:21

answered Jul 11 '10 at 20:15

Frunsi

7,099
5
36
42

100 times is not enough. Try at least 10,000 times, and if that's still imperceptibly fast, try a million times. – dan04 Jul 11 '10 at 20:27
I did 50 times and: ( / 50) 100 points: 2.36, 150: 4.36, 250: 9.4 – jmasterx Jul 11 '10 at 20:28
@Jex: lets see, 250x2xsizeof(double)=4000 bytes, thats below a typical cache line size, but may still be too large. a detailed analysis would exceed space and time here ;-). but you may try using a single vector, instead of one vector per contour (this improves locality and makes it easier for the cpu to manage ram accesses) – Frunsi Jul 11 '10 at 20:35
@frunsi: What platform has a 4,000 byte cache line? – James McNellis Jul 12 '10 at 02:02
@james: the future shiny new platform!? no, my fault, i meant cache size (not cache line size). – Frunsi Jul 12 '10 at 02:28

score 0 · Answer 4 · edited May 23 '17 at 12:03

There's a lot of guessing going on here. Good guesses, I imagine, but guesses nevertheless. And when you try to measure the time functions take, that doesn't tell you how they take it. You can see if you try different things that the time will change, and from that you can have some suggestion of what was taking the time, but you can't really be certain.

If you really want to know what's taking the time, you need to catch it when it's taking that time, and find out for certain what it's doing. One way is to single-step it at the instruction level through that code, but I suspect that's out of the question. The next best way is to get stack samples. You can find profilers that are based on stack samples. Personally, I rely on the manual technique, for the reasons given here.

Notice that it's not really about measuring time. It's about finding out why that extra time is being spent, which is a very different question.

Std::vector fill time goes from 0ms to 16ms after a certain threshold?

4 Answers4