-1

I am investigating data structures to satisfy O(1) get operations and came across a structure called Trie.

I have implemented the below simple Trie structure to hold numbers (digits only).

Ignore the memory leak - it is not the topic here :)

The actual storage in the Data class is not related as well.

#include <sstream>
#include <string>

struct Data
{
    Data(): m_nData(0){}
    int m_nData;
};

struct Node
{
    Node(): m_pData(NULL)
    {
        for (size_t n = 0; n < 10; n++)
        {
            digits[n] = NULL;
        }
    }

    void m_zAddPartialNumber(std::string sNumber)
    {
        if (sNumber.empty() == true)   // last digit
        {
            m_pData = new Data;
            m_pData->m_nData = 1;
        }
        else
        {
            size_t nDigit = *(sNumber.begin()) - '0';
            if (digits[nDigit] == NULL)
            {
                digits[nDigit] = new Node;
            }

            digits[nDigit]->m_zAddPartialNumber(sNumber.substr(1, sNumber.length() - 1));
        }        
    }
    Data* m_pData;
    Node* digits[10];
};

struct DB
{
    DB() : root(NULL){}

    void m_zAddNumber(std::string sNumber)
    {
        if (root == NULL)
        {
            root = new Node;
        }
        root->m_zAddPartialNumber(sNumber);        
    }

    Node* root;
};


int main()
{
    DB oDB;

    for (size_t nNumber = 0; nNumber <= 10000; nNumber++)
    {
        std::ostringstream convert;
        convert << nNumber;
        std::string sNumber = convert.str();

        oDB.m_zAddNumber(sNumber);
    }

    return 0;
}

My main function is simply inserting numbers into the data structure. I've examined the overall memory allocated using Windows task manager and came across an interesting feature i can't explain and am seeking your advice.

I've re-executed my simple program with different numbers inserted to the structure (altering the for loop stop condition) - here is a table of the experiment results:

Experiment data

Plotting the numbers in a logarithmic scaled graph reveals:

Experiment graph result

As you can see, the graph is not linear.

My question is why?

I would expect the allocation to behave linear across the range.

Community
  • 1
  • 1
NirMH
  • 4,769
  • 3
  • 44
  • 69
  • 2
    My guess: At the lower end, the memory usage is dominated by other things in your program (IO streams, general heap management structures, whatever other things your runtime needs). The heap probably allocates and initial large block, and distributes from that. At the upper end, it's dominated by the allocations for your structure, and appears quite linear at roughly 0.3 KB per number. From 10000 up, every time you multiply the number N by 10, you increase the size by 10x. That appears to be asymptotically linear in the long run. – Dave S Jun 09 '14 at 11:09
  • What is the "Heap Reserve Size" in the linker settings of your project's properties? – harper Jun 09 '14 at 11:38
  • @harper - an empty value - no settings at all – NirMH Jun 09 '14 at 12:38

1 Answers1

1

A linear relation of y on x is of the form y=a+bx. This is a straight line in a y vs x plot, but not in a log(y) vs log(x) plot, unless the constant a=0. So, I conjecture that your relation may still be (nearly) linear with a~340 kB.

Walter
  • 44,150
  • 20
  • 113
  • 196
  • due to the difference in Y axe values, i switch y axe to logarithmic scale. Therefore linear would have been a~340KB – NirMH Jun 09 '14 at 11:31