0

I am trying to count occurrences of every word in text. So I have stored all words and counts in binary tree:

typedef struct Node{
         char* word;
         int count;
         struct Node *left;
         struct Node *right;
         struct Node *parent;
} Node;

Now I need to sort tree by number of count. I can't just do while cycle and sort it, so I am wondering, which way I can do it?

Here is example of what I have now:

                               The - 3
                             /       \
                    Project - 1      of - 3
                     /    \          /    \ 
                 ....     ....      ....    ....

And I need to print top N words in text.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Vadim Tor
  • 40
  • 9

2 Answers2

1

Traverse the tree and extract the word and its count into an array of these:

struct WordAndCount {
  char * word;
  int count;
};

Then use qsort to sort the array. You will need a custom compare function that compares WordAndCount.count;

pm100
  • 48,078
  • 23
  • 82
  • 145
  • At my first try I have implemented whole application by using array of structs, but is works really slow when I am sorting a big amount of words, so I am trying to implement it with log(n) speed. – Vadim Tor Apr 12 '18 at 17:52
  • @VadimTor: This answer is correct: You need a second data structure to find the N elements with the largest count. Your binary tree is good for finding and inserting words, but you cannot change the sorting criterion later. Building an array and sorting is straightforward, but there are [better techniques](https://stackoverflow.com/questions/7272534/finding-the-first-n-largest-elements-in-an-array), especially if N is small with regard to the number of different words. – M Oehm Apr 12 '18 at 18:25
  • qsort has nlog(n) speed which is pretty good. https://en.wikipedia.org/wiki/Quicksort – pm100 Apr 12 '18 at 18:43
  • Yes, but [quickselect](https://en.wikipedia.org/wiki/Quickselect) has O(n) on average, (but O(n²) in the worst case). Both qsort and quickselect have the disadvantage that you need to build a large array. If N is small, it might be better to use a min-heap of N elements or just keep a sorted array of N elements with insertion sort. These methods are arguably better, but qsort can be used out of the box, so it's a good solution for this exercise. – M Oehm Apr 12 '18 at 19:30
0

What is your criteria for storing items in the tree? Is it the case that nodes to the left always have lower count than nodes to the right? If so, to get the top N words you do a post-order traversal keeping a counter and stop it when it reaches N.

Hisham H M
  • 6,398
  • 1
  • 29
  • 30
  • I think the tree is a binary tree sorted by the words, so that it's easy to find words and insert new words. It is built as the text is scanned. (In the axample, "The" is smaller than "of", because it starts with a capital T.) The other answer is right: You need a second data structure to find the top k elements. – M Oehm Apr 12 '18 at 18:07
  • I am using strcmp function to compare words and than store them to the tree. @MOehm guessed right :) – Vadim Tor Apr 12 '18 at 19:02