19

I just asked a question about how the Erlang compiler implements pattern matching, and I got some great responses, one of which is the compiled bytecode (obtained with a parameter passed to the c() directive):

{function, match, 1, 2}.
  {label,1}.
    {func_info,{atom,match},{atom,match},1}.
  {label,2}.
    {test,is_tuple,{f,3},[{x,0}]}.
    {test,test_arity,{f,3},[{x,0},2]}.
    {get_tuple_element,{x,0},0,{x,1}}.
    {test,is_eq_exact,{f,3},[{x,1},{atom,a}]}.
    return.
  {label,3}.
    {badmatch,{x,0}}

Its all just plain Erlang tuples. I was expecting some cryptic binary thingy, guess not. I am asking this on impulse here (I could look at the compiler source but asking questions always ends up better with extra insight), how is this output translated in the binary level?

Say {test,is_tuple,{f,3},[{x,0}]} for example. I am assuming this is one instruction, called 'test'... anyway, so this output would essentially be the AST of the bytecode level language, from which the binary encoding is just a 1-1 translation?

This is all so exciting, I had no idea that I can this easily see what the Erlang compiler break things into.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
deepblue
  • 8,426
  • 13
  • 48
  • 60

2 Answers2

12

ok so I dug into the compiler source code to find the answer, and to my surprise the asm file produced with the 'S' parameter to the compile:file() function is actually consulted in as is (file:consult()) and then the tuples are checked one by one for further action(line 661 - beam_consult_asm(St) -> - compile.erl). further on then there's a generated mapping table in there (compile folder of the erlang source) that shows what the serial number of each bytecode label is, and Im guessing this is used to generate the actual binary signature of the bytecode. great stuff. but you just gotta love the consult() function, you can almost have a lispy type syntax for a random language and avoid the need for a parser/lexer fully and just consult source code into the compiler and do stuff with it... code as data data as code...

deepblue
  • 8,426
  • 13
  • 48
  • 60
  • 3
    Have you looked at Robert Virding's Lisp-Flavoured Erlang (http://forum.trapexit.org/viewtopic.php?p=40268) – Gordon Guthrie Feb 26 '09 at 16:54
  • yeah I've seen it, havent used it yet although its pretty high on my list of things to do in the short term. thanks tho – deepblue Feb 26 '09 at 17:40
5

The compiler has a so-called pattern match compiler which will take a pattern and compile it down to what is essentially a series of branches, switches and such. The code for Erlang is in v3_kernel.erl in the compiler. It uses Simon Peyton Jones, "The Implementation of Functional Programming Languages", available online at

http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/

Another worthy paper is the one by Peter Sestoft,

http://www.itu.dk/~sestoft/papers/match.ps.gz

which derives a pattern match compiler by inspecting partial evaluation of a simpler system. It may be an easier read, especially if you know ML.

The basic idea is that if you have, say:

% 1
f(a, b) ->
% 2
f(a, c) ->
% 3
f(b, b) ->
% 4
f(b, c) ->

Suppose now we have a call f(X, Y). Say X = a. Then only 1 and 2 are applicable. So we check Y = b and then Y = c. If on the other hand X /= a then we know that we can skip 1 and 2 and begin testing 3 and 4. The key is that if something does not match it tells us something about where the match can continue as well as when we do match. It is a set of constraints which we can solve by testing.

Pattern match compilers seek to optimize the number of tests so there are as few as possible before we have conclusion. Statically typed language have some advantages here since they may know that:

-type foo() :: a | b | c.

and then if we have

-spec f(foo() -> any().
f(a) ->
f(b) ->
f(c) ->

and we did not match f(a), f(b) then f(c) must match. Erlang has to check and then fail if it doesn't match.

I GIVE CRAP ANSWERS
  • 18,739
  • 3
  • 42
  • 47