1

I have to build a tf-idf based search engine and I have a large amount of data to build the model upon. I have to answer prefix based queries in the engine and hence, I would like to find a data structure that would take the least amount of space while delivering low search time.

I have read about tries, ternary search trees, B+ trees, and Directed Acyclic Word Graphs. I have taken a look at the theoretical space and time complexity for the data structures but confused about their real world performance.

Which is the best data structure to use for the above task?

  • I am leaning to close as dupe of [What is the best data structure for text auto completion](https://stackoverflow.com/q/9471823/572670). Can you please tell if this is actually what you are asking, or if there are caveats relevant here and not there? – amit Oct 17 '20 at 17:40
  • The answer is relevant to the question that I asked. But I am much more constrained in terms of space complexity and hence, Trie based approaches cannot be used due to higher storage requirement. – shubh gupta Oct 17 '20 at 19:01
  • I would have recommended closing because the theoretical approaches exist as real world tradeoffs on constraints that you have not described. So there would be no good answer. – btilly Oct 17 '20 at 20:23

0 Answers0