6

UPD: I moved original question to https://codereview.stackexchange.com/questions/127055/building-tree-graph-from-dictionary-performance-issues

Here is a short version, without codes.

I'm trying to build a prefix tree from dictionary. So, using the following dictionary 'and','anna','ape','apple', graph should look like this: graph I've tried 2 approaches: using associative array and using self-written tree/node classes.

Note: original dictionary is something about 8 MB and contains >600000 words.

Question: is there any good(fast/efficient) way to do it?

I've tried so far:

  • php associative arrays (they are not very flexible for future work with this graph).

  • self-written Tree/Node classes (performance issues - execution time rises by up to 7x, memory usage rises by 2x even without implementing anything except just inserting function).

Sample codes are available on codereview (the very first link in question)

Community
  • 1
  • 1
haldagan
  • 911
  • 5
  • 16
  • They both have the same code/execution complexity, not the same memory footprint and execution speed. Depending on the PHP version you run this under classes use more or less memory too. If you are looking for better performance and not just learning stuff I'd suggest looking into nested sets. You'll find ready to use PHP implementations too: http://stackoverflow.com/questions/272010/searching-for-the-best-php-nested-sets-class-pear-class-excluded – Sergiu Paraschiv Apr 29 '16 at 13:00
  • 2
    This question is better suited for [code review](http://codereview.stackexchange.com) – nickb Apr 29 '16 at 13:01
  • @Sergiu Paraschiv - I'll look into it – haldagan Apr 29 '16 at 13:08
  • @nickb I'm not actually asking to review my code, I'm basically asking "why is tree implementation on classes is MUCH MORE slower, than implementation on arrays?". The code is given to illustrate the problem. I actualy was expecting something near 2x speed drop. 8x+ just shocked me. – haldagan Apr 29 '16 at 13:12
  • Sure you are - A performance review is still a review. See the [tour of code review](http://codereview.stackexchange.com/tour) - "Ask about... The quality of your working code with regards to: Performance". – nickb Apr 29 '16 at 13:16
  • @nickb oh... Thanks for the information. Is there an easy way to move the question from here to there? – haldagan Apr 29 '16 at 13:19
  • I'm voting to close this question as off-topic because it has been cross-posted on a better suited Stack Exchange site. In future, please only post your question on a single Stack Exchange site. For more information, see [here](http://meta.stackexchange.com/q/64068) – Matt May 19 '16 at 12:15
  • @Matt I get your point, still I've modified my initial question right after moving it to codereview, so it stopped being a duplicate. In short terms: current question is "what are known ways to solve a problem?" and the one on codereview is "why are my solutions so slow?" (you can check codereview question and see the difference). – haldagan May 20 '16 at 06:26
  • @haldagan: ah, my apologies. But in its current form, this question is not suitable for Stack Overflow "is there any good(fast/efficient) way to do this" is primarily opinion based (good!?) and too broad (lots of ways of doing it). – Matt May 20 '16 at 11:21
  • @Matt Technically you are right, yet i've discovered only 2 widely-used techniques - saving trie as a list with pointers and using hashmaps. So basically there are no other sensible ways of building tries (and even if there were - it would be at least educating for me to learn those). As for "opinion based" - my question clearly states "good(*fast/efficient*)", so basically I was asking about known approaches that could beat naive `array` or simple OOP implementation in sense of time of execution and memory usage on a large dictionary. – haldagan May 20 '16 at 12:35
  • So good(*fast/efficient*) can not be based on one's opinion - only on hardware/code used. – haldagan May 20 '16 at 12:43

1 Answers1

0

As long as I've switched to C++ and got a good answer on codereview, I'll just answer my own question here.

There is one more way to do it way more time-efficient by increasing memory usage(it's not really big increase, compared to "array of arrays of arrays..." approach). The approach is called "double array trie" and you can read info on this topic here and read the aforementioned answer on codereview to see an example of implementation.

It's more time-efficient, yet it allows less flexibility/convenience for future trie use (compared to OOP approach).

So the final answer on this question for me is: "php is not the best tool to work with really big tries with".

Community
  • 1
  • 1
haldagan
  • 911
  • 5
  • 16