0

I need to construct a Binary Search Tree from a file that has more than 2 million lines (each line will give me a pair key/val). Since the data is ordered, if I just read one line, get the key and val and add to my tree, the height will be huge so that the tree will be inefficient to search. So, I was thinking if there is a good way to construct this search tree so that it doesnt have a huge height. My attempt was to get the first 100.000 keys, shuffle, put on tree and so on, but it doesnt seems much efficient. Any suggestion?

P.S: I have to use a not balanced search tree.

Thanks !

Giiovanna
  • 424
  • 6
  • 21
  • If there data is already ordered, you're likely better off just using an array or `List` and doing a binary search. Otherwise you need to construct a balanced tree or else you'll just get a mess... – Boris the Spider May 29 '15 at 14:10
  • as I said, I need to use Binary Search Tree unbalanced. – Giiovanna May 29 '15 at 14:15
  • This might interest you http://stackoverflow.com/questions/30521757/would-this-algorithm-run-in-on – dejvuth May 29 '15 at 14:19

2 Answers2

1

If you can read the file multiple times you can read the file the first time and read say 1000 entries (i.e one every 2000 rows) in al list and then make a first balanced insertion so you insert first the element at position 500 then two at position 250 and 750 then positions 4 at positions 125,375,625,975, etc. After the first pass you can read the whole file (and managing the duplicates) and get a more balanced tree.

An alternative is not to use a BinarySearchTree at all, but an Array, since the data are ordered you can use binary search (you check the value at the middle of the array and if the value you get is bigger you repeat the operation with the first half of the list, it ithe value is lower you use the second half of the list); but I don't know if using a List meets your requirements.

Giovanni
  • 3,951
  • 2
  • 24
  • 30
0

As a side note, creating a BST when you're already handed a sorted array is kind of a crazy thing to do, but with that aside...

If you're given a sorted array already, it's practically giving you the answer for how to construct a balanced BST with a minimum height. For simplicity, let's imagine the array is:

[0,1,2,3,4,5,6,7,8,9,10]

In such a case, what would be the optimal element to store at the root for a balanced tree? The natural answer is the middle of the list, 5.

So then we're left with two sub-ranges of the array:

i<5: [0,1,2,3,4]
i>5: [6,7,8,9,10]

So what is be the ideal element to store at the left child? Again we take the center of the left child list (i<5), and that would be 2, and we have two sub-ranges of that array:

i<2: [0,1]
i>2: [3,4]

And we can repeat this logic recursively until we're left with a single child or none in both ranges, at which point we've made a leaf node.

Applied to both sides of every branch recursively, drilling down to the leaves, this will give you that optimal balanced tree.