Erlang Compiler - Pattern Matching Performance and Underlying Data Structure

Question

I am wondering what the underlying data structure used is, and what the performance of pattern matching is? In particular, compared to the performance of searching a Trie.

Update: I am looking for a concise and precise understanding of what pattern matching implemented by the Erlang compiler is. What is the underlying data structure, and how efficient is a search for a pattern?

See the answers to these questions: - http://stackoverflow.com/questions/586362/pattern-matching-implementation - http://stackoverflow.com/questions/2908357/how-does-pattern-matching-work-behind-the-scenes-in-f — RichardC, Feb 11 '17 at 08:06
Thanks for aggregating those links. I actually have read both of them before posting this and determined that even though they were useful, they did not directly answer the question I had. I am hoping that someone familiar with the way Erlang implements pattern matching will see this and elucidate that algorithmic complexity of the implementation in relation to a Trie. — suprafly, Feb 13 '17 at 17:42

score 1 · Accepted Answer · answered Feb 15 '17 at 22:34

Pattern matching compilation does not have an "underlying data structure" in itself - it is just a strategy for decomposing any given data structure according to a set of patterns and minimizing the number of steps needed to tell whether a match is found, or whether no match is possible.

If your input is a string and the patterns are prefixes of that string, then the behaviour is just like that of trie search. Taking the example from https://en.wikipedia.org/wiki/Trie and expressing it as an Erlang case switch:

case String of
    "tea" ->  3;
    "ted" ->  4;
    "inn" ->  5;
    "to"  ->  7;
    "in"  ->  9;
    "i"   -> 11;
    "ten" -> 12;
    "A"   -> 15
end

Since there are no guard expressions to complicate the clauses, the compiler is free to reorder them for better efficiency (sorting them by type and value), so that all the patterns that share the same prefix are adjacent. This is convenient for the programmer, who doesn't have to care about manually keeping the list in order.

After that, the compiler will turn the set of clauses into a number of nested smaller case expressions performing a minimal number of tests. First, it would check whether the first character is an A, an i, or a t. If not, there can be no match, otherwise branch to examine the rest of the string; e.g., if the first character was an i, check if the next character is an n or the end of the string. Again, if it is neither, then there can be no match, otherwise branch again. And so on, generating code to examine all branches of all the patterns.

So those clauses that the compiler generates become a data structure. The question is- what data structure is being used? If it is searching branches, what kind of a tree structure are we dealing with? I am not looking for a general explanation of the principles of pattern matching here. I am looking for an exact and precise understanding of how the Erlang compiler optimizes the compilation of pattern matching into a data structure, and the time complexity required to search that compiled data structure. — suprafly, Feb 16 '17 at 19:52
No - the clauses and patterns become code; just a bunch of nested if-then-else, that performs the search. The only data structure is the input to the whole case switch. Unless you want to consider code as a "data structure", there is not much else to say. — RichardC, Feb 17 '17 at 18:56

Erlang Compiler - Pattern Matching Performance and Underlying Data Structure

1 Answers1