1

I have implemented a simple statistical engine to return rolling mean and variance using a deque to provide a data queue.

The deque is constructed with a number of entries equal to the rolling number of values.

When a new value arrives the oldest value is popped of the front and the new one pushed onto the back.

I need to be sure that this is not going to grow in memory as it is expected to run as a background task for a long time.

Does deque allocate on the heap in use? Are there flags that I can use to fix its size?

I am using G++ 4.1.2 on RHEL 5.3

DanS
  • 1,677
  • 20
  • 30

3 Answers3

9

Essentially, any dynamically sized container allocates memory from the heap. Another question provides an overview over the implementation of the deque.

But in your particular case, the queue always has the same size. If you hit problems with the deque, it might be beneficial to implement a simple fixed-size queue using a circular buffer on a fixed-sized array. This implementation should have fundamentally better memory behaviour (since it never requires reallocation). Whether its advantage is worth the trouble of implementing is hard to assess without profiling data.

Community
  • 1
  • 1
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Yeah, in this case, the buffer would be pretty simple. Just keep track of the next index to insert at. Once it wraps around, you'll be overwriting old values, giving you exactly the behavior you're looking for. – jpm Jun 28 '11 at 13:35
  • 7
    BTW: boost has a circular buffer: http://www.boost.org/doc/libs/1_46_1/libs/circular_buffer/doc/circular_buffer.html – stefaanv Jun 28 '11 at 13:41
3

Just as a tip, if you don't need to keep track of the values there is this great algorithm that is very lightweight (I even use it on 8bit micros) and is accurate.

 class RunningStat
{
public:
    RunningStat() : m_n(0) {}

    void Clear()
    {
        m_n = 0;
    }

    void Push(double x)
    {
        m_n++;

        // See Knuth TAOCP vol 2, 3rd edition, page 232
        if (m_n == 1)
        {
            m_oldM = m_newM = x;
            m_oldS = 0.0;
        }
        else
        {
            m_newM = m_oldM + (x - m_oldM)/m_n;
            m_newS = m_oldS + (x - m_oldM)*(x - m_newM);

            // set up for next iteration
            m_oldM = m_newM; 
            m_oldS = m_newS;
        }
    }

    int NumDataValues() const
    {
        return m_n;
    }

    double Mean() const
    {
        return (m_n > 0) ? m_newM : 0.0;
    }

    double Variance() const
    {
        return ( (m_n > 1) ? m_newS/(m_n - 1) : 0.0 );
    }

    double StandardDeviation() const
    {
        return sqrt( Variance() );
    }

private:
    int m_n;
    double m_oldM, m_newM, m_oldS, m_newS;
};

This algorithm was created by B. P. Welford and is presented in Donald Knuth's Art of Computer Programming, Vol 2, page 232, 3rd edition.

http://www.johndcook.com/standard_deviation.html

Vinicius Kamakura
  • 7,665
  • 1
  • 29
  • 43
  • I've been looking into whether this can be adapted to use a fixed size window for variance. I am trying to take of the component due to the oldest term and introduce the new one. My initial tests suggest that it will work. Does anyone know whether this is a dead end? Using this I will need to keep the data in a circular buffer but a simple incremental sum will avoid having to do lots of averaging. – DanS Jun 29 '11 at 22:55
  • Looking at this it appears to calculate the mean (etc.) for the full dataset at each step. That is, this does not calculate the *rolling* mean, as is requested in the question. It becomes more difficult to calculate a rolling mean in this way as it is then necessary to keep track of which values are dropping out of the window, at each step, also. Just in case that's not clear. – eff May 31 '23 at 10:05
0

The specification leaves implementation details to the vendor. However, since insertion at both ends is efficient, it is most likely implemented as a linked structure on the heap. That being said, when you pop something off the heap, it should get deconstructed, so your total memory usage shouldn't climb.

jpm
  • 3,165
  • 17
  • 24
  • A “linked structure” doesn’t cut it since the deque guarantees O(1) indexing access. Some details are left to the vendors but a vector-based page implementation is pretty much required. – Konrad Rudolph Jun 28 '11 at 14:24