0

I have created this simple BST in Ruby:

class Node
  attr_accessor :data, :left, :right
  
  def initialize data
    @data = data
  end
end

def insert root, data
  current = root
  
  loop do
    node = current.data < data ? current.right : current.left
    break if node.nil?
    current = node
  end
  
  if current.data < data
    current.right = Node.new(data)
  else
    current.left = Node.new(data)
  end
end

Note: I don't use recursion because I get a "stack too deep" exception when the BST becomes large.

Then I try to insert about 6M records:

root = Node.new('root')
insert root, "something"
insert root, "another"
# ...

It takes about 30 minutes to complete on my MacOS, which seems really too much!

Why it takes so long (with SQLite I can index in 30 seconds the same data)?

Can you suggest any optimization to the above code?

collimarco
  • 34,231
  • 36
  • 108
  • 142
  • Ruby is a slow language (compared to C or C++, which is what MySQL is written with). Also, this code probably does not play too well with cpu caches. Also, indexes in MySQL are BTree+ (IIRC), not BST. – Sergio Tulentsev Sep 08 '22 at 14:31
  • For storing and looking up strings, perhaps, a Trie would be a better data structure. – Sergio Tulentsev Sep 08 '22 at 14:34
  • 1
    Have you tried put it on an array, sort it and then perform binary search instead? I'm just wondering how does that compare in performance. – OscarRyz Sep 08 '22 at 14:36
  • have you checked how long the creation of 6M Nodes takes (without creating the tree structure)? Try to narrow down, where the time is spent. Also for later use you should ensure that your BSP is quite balances to have 'good' performance while searching . – MrSmith42 Sep 08 '22 at 15:17

2 Answers2

1

Is Ruby slow?

Here's a StackOverflow question with highly detailed answers. In short, comparing Ruby's performance with SQLite's is pointless; only one of these tools is optimised to work with data.

Optimizing the approach

Since you're using a BST, each insertion takes O(n) time in the worst case (which is a skewed tree). Inserting n nodes would have O(n^2) complexity, which might explain the high running time. You could replace the BST with a height balanced BST (also called AVL tree), where all insertion/deletion/search operations can be done in O(log n)

Abhinav Mathur
  • 7,791
  • 3
  • 10
  • 24
  • 4
    Good point, I think the chances are that the data he's inserting is pre-sorted (which results in this one-sided BST) – Sergio Tulentsev Sep 08 '22 at 15:09
  • I would like to point out that while ruby, as an interpreted language, is, and always will be, slower than static precompiled languages(in many but not all instances), the latest post in that link is from 2017. Ruby core team has made dramatic performance alterations since then for instance the touted [Ruby 3x3](https://www.ruby-lang.org/en/news/2020/12/25/ruby-3-0-0-released/) – engineersmnky Sep 09 '22 at 22:06
0

I'm not sure if you strictly need a BST, but in case you can consider an alternative, having your data stored in an array and having it sorted (hopefully only once) would be way faster.

The following takes 2 seconds on my machine, whereas I stopped your BST implementation after 30 seconds.

data = []
6_000_000.times{|i|
  data << rand(1_000_000)
}
# data << 3000 #force 3000 to be there... 
data.sort! # only need to sort once
p data.bsearch{ |x| 3000 <=> x }

If you need to keep adding data, maybe re-sorting all the time would still be better.

OscarRyz
  • 196,001
  • 113
  • 385
  • 569