0

This was an interview question: "Which data structure would you use to check if a record exists in a database?"

My immediate answer was Binary Search Tree.

The interviewer didn't comment and went on to the next question. What's the answer to this question?

Christophe
  • 68,716
  • 7
  • 72
  • 138
user1529412
  • 3,616
  • 7
  • 26
  • 42
  • The data structure used by relational databases is usually B-Tree. – RealSkeptic Feb 09 '15 at 22:38
  • 1
    Were you being asked about how you would *build* a database, or *use* one? – Scott Hunter Feb 09 '15 at 22:39
  • It was actually in the context of building a database. – user1529412 Feb 09 '15 at 22:40
  • In these kind of interviews, giving swiftly with confidence an acceptable answer is more important that giving the perfect answer. So you certainly marked points here ! This being said, a B-tree would have been a better answer (the b-tree is a generalisation of a binary tree: http://en.wikipedia.org/wiki/B-tree which can have more than two nodes) – Christophe Feb 09 '15 at 22:52

2 Answers2

2

There's a lot of answers to this question, and it all depends on what exactly this record contains and what you want to do with it.

I would have answered hash table, due to very fast search times for amortized cases( O(1) ). It also has the added benefit of fast insertions and deletions.

A binary search tree works well if you plan to do operations on the records as a whole (i.e n-th smallest salary) , but if all you're doing is searching the database for existence then you're looking at longer run-times for searching.

2

There are many acceptable answers, and in such interview, providing swiftly and with confidence an acceptable answer is more important than giving the perfect answer.

Binary trees is definitively a vaild anwer. So congratulations !

However for databases, B-trees (the "B" stands for "balanced") would be even more recomended. The B-tree is a generalisation of the binary tree where each node has more than two children. This make this data structure more efficient for optimising disk read access. The structure also needs less rebalancing than binary trees, meaning again less disk write access.

If you are interested in performance consideration, this SO answer makes an interesting comparison between both structures.

Now, just for records, in some application areas there are more specialised structures such as for example R-trees for 3D spatial data, or hash tables if you consider looking for unique keys and are ready to sacrifice some space for more speed.

Edit: Some example of popular databases (not exhaustive !):

  • sqllite uses b-trees (and has an r-tree extension)
  • BerkleyDB uses B-trees and hash indexes
  • MySQL uses B-trees and hashes (ans has r-trees as well)
  • Postgresql uses B-trees, r-tree, hashes and several others
  • SQLserver apparently uses B-trees as well
Community
  • 1
  • 1
Christophe
  • 68,716
  • 7
  • 72
  • 138
  • So, are databases implemented using Balanced Binary Search Trees? – user1529412 Feb 09 '15 at 23:03
  • Advanced databases use several indexing techniques to take advantage of data chararcteristics). But B-tree and its variants is heavily used. You can have for example a look at sqllite (https://www.sqlite.org/fileformat2.html#btree) – Christophe Feb 09 '15 at 23:16