4

I'm writing a function that finds out the total number of items in a AVL tree by range. For example, the arguments that passed in is "ab" and "au", then I need to find out how many items they are in an AVL tree is in that range.

Currently my way of doing this is to traverse the tree every time when the client calls it. But because the number of items in my AVL tree is vary large, it takes forever if the client calls this function too many times. Is there a faster way to do that?

My range function:

void range(AvlTree T, char* k1, char* k2) {
    if ( T == NULL )
        return;

    if ( strcmp(k1, T->Element) < 0 )
        range(T->Left, k1, k2);

    if ( strcmp(k1, T->Element) <= 0 && strcmp(k2, T->Element) >= 0 )
        total++;

    if ( strcmp(k2, T->Element) > 0 )
        range(T->Right, k1, k2);
}
Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
  • 1
    OT: Nothing to do with efficiency but... `total` must be a global variable!? Don't, don't, don't do that... – Support Ukraine Feb 13 '20 at 21:30
  • As @4386427 has said if it is a global variable, better to change to something else. Try to pass a reference to the function and use that as your counter. – ranu Feb 13 '20 at 21:41

1 Answers1

3

Your current algorithm has a complexity of O(M + log N) where N is the size of the tree and M is the number of elements within the range. I don't think that you can do any better with unaugmented AVL tree. So the solution would involve changing your tree implementation.

A straightforward way to do that is to store in each node the size of the subtree at that node. This information can be updated in constant time during tree rotation. Later it can be used to skip entire sub-trees as follows:

int range(AvlTree T, const char* k1, const char* k2) {
    assert(!k1 || !k2 || strcmp(k1, k2) <= 0);
    if(T == NULL)
        return 0;
    if(!k1 && !k2)
        return T->size;
    if(k2 && strcmp(k2, T->Element) < 0)
        return range(T->left, k1, k2);
    if(k1 && strcmp(T->Element, k1) < 0)
        return range(T->right, k1, k2);
    return range(T->left, k1, 0) + 1 + range(T->right, 0, k2);
}

This would give an O(log N) complexity.

DISCLAIMER: the code is untested.

Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
  • Minor idea: lesser compilers would improve with knowing the `const`-ness, so `int range(const AvlTree T, const char* k1, const char* k2) {`. If anything a linear improvement, unlike this good answer's reduction. – chux - Reinstate Monica Feb 13 '20 at 22:10
  • @chux-ReinstateMonica: Thank you for the correction. To my defense I must say that I copied that part from the OP. However, [that would not improve the performance of the code](https://stackoverflow.com/q/6313730/277176). – Yakov Galka Feb 13 '20 at 22:27
  • Your reference relies on 'If the compiler knows enough about the caller of foo() and the contents of bar() that it can prove bar() does not modify *p, then it can also perform that proof without the const declaration.". My point was that some compilers have trouble figuring this out - so even if the compiler _could_, it does not unless `const` is present. Hence my CYA "lesser compilers". – chux - Reinstate Monica Feb 13 '20 at 22:31
  • IAC, `const` does allow greater application as `const char *end = "au"; range(T, "ab", end);` is needlessly not allowed in OP's `const`-less version. – chux - Reinstate Monica Feb 13 '20 at 22:33
  • What is `if(!k1 && !k2) return T->size;` do? Do I need to write a function that calculate the size of each node? – ryanchai_1995 Feb 14 '20 at 05:25
  • @ryanchai_1995: it returns the cached size in the case when both keys are NULL -- it's a base case of recursion. You don't calculate the size of each node during the search. Please read my answer carefully. You only maintain it during tree rotations when you insert/delete things from the AVL -- change your tree rotation functions accordingly. – Yakov Galka Feb 14 '20 at 14:21