0

Currently, I've stored the strings into an array, and I've scanned the entire array for strings that start with the substring. ["stackoverflow", "stackunderflow"] If I search for "stack", I get both strings, as they both start with "stack". If I search for "stacko", I get one string, "stackoverflow".

However, this is really slow, so what would be a more efficient method and how would I implement it? Also, I've tried searching around, and a lot of people say a Trie would be a lot faster, but how would I implement one?

Pyr0Sh4rk
  • 49
  • 2
  • https://medium.com/@fro_g/writing-a-simple-inverted-index-in-python-3c8bcb52169a – Amir Afghani Nov 08 '20 at 17:09
  • The easiest way to speed up considerably is to store array as sorted array. If you're not allowed to store sorted array then you may at least store indexes that sort array. Then in sorted array you can search for prefix just in `O(log n)` time by doing binary search algorithm. – Arty Nov 08 '20 at 17:09
  • Please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). "Show me how to solve this coding problem?" is off-topic for Stack Overflow. Since you have not yet researched the algorithm nor attempted to code the problem yourself, you do not yet have a Stack Overflow question. – Prune Nov 08 '20 at 17:37
  • @Pyr0Sh4rk I implemented my binary search approach that I mentioned in previous comment, [here is the code](https://cutt.ly/RgLPG4v), try it! It basically creates 100000 random strings and does 10000 searches in it using binary search, and measures time. Time to search one string was 18 micro-seconds, so very fast! – Arty Nov 09 '20 at 05:23
  • @Pyr0Sh4rk I decided to improve my code, in [my second version of code](https://repl.it/@moytrage/StackOverflow64740781#main.py) I implemented both, binary search algorithm and trie algorithm. Trie appeared to be faster, but not much, Trie search time is 3.5 mcs (micro-seconds) while Binary search time is 20 mcs, on my PC. There is usage example of my search function inside code. basically you do `sr = searcher(all_strings, type_ = 'trie')` one time and later search many times prefix by `is_found, pos = sr(prefix)` and if `is_found` is true then we found answer and `all_strings[pos]` is result – Arty Nov 09 '20 at 12:28
  • Continuing of previous comment. Also if `is_found` is false then we didn't find exact result, but the closest answer that we have found is `all_strings[pos]`, closest means that it has longest common prefix with resulting string `all_strings[pos]` and `prefix`. – Arty Nov 09 '20 at 12:32

0 Answers0