11

I'd like to parse C header files in Javascript. Is there any such library available? Otherwise, any tips to help me get started?

Update: My ultimate goal is to automatically build interfaces for node-ffi. The parser doesn't necessarily have to be in Javascript as long as it can spit out a format understandable by Javascript. If it's very hard to develop by myself, I'll probably have to go with an off the shelf solution...?

simon-p-r
  • 3,623
  • 2
  • 20
  • 35
Olivier Lalonde
  • 19,423
  • 28
  • 76
  • 91
  • 2
    err i really don't understand the question... parse a HEADER file? to what purpose – Aniket Inge Nov 01 '12 at 05:48
  • 1
    I hate to say it this way, but... are you **sure** you want to do that? Parsing C's syntax is notoriously hard, even if you didn't have to deal with C pre-processor macro expansion and includes. – Jamey Sharp Nov 01 '12 at 05:49
  • @JameySharp writing a CPreProcessor that expands the macros and includes files is extremely easy compared to parsing the rest of the syntax of C. – Aniket Inge Nov 01 '12 at 05:52
  • 1
    Parsing is a *huge* subject. Which C standard are you aiming for? What do you want to parse it to? Why do you even want to do that? Furthermore, have you any background experience in parsing? – Zirak Nov 01 '12 at 05:53
  • I don't usually downvote. I will be forced to downvote this question if the OP doesn't bother clarifying – Aniket Inge Nov 01 '12 at 05:54
  • @Aniket Hah, fair enough. :-) – Jamey Sharp Nov 01 '12 at 05:54
  • 1
    When it comes to pure parsing of C source or headers, like just creating an AST, I find it relatively trivial compared to most other languages. C is actually a very simple language in that way. However, if you don't know what is meant by terms like "AST" or "recursive descent" you definitely have a bit of a learning curve in front of you. If you explain the _reason_ you want to do this we might be able to help you better. – Some programmer dude Nov 01 '12 at 06:41
  • Yes, I believe an AST would be well enough for my purpose. – Olivier Lalonde Nov 01 '12 at 07:44

3 Answers3

8

You should check out clang.

For a simple command-line invocation, you can try this:

clang -cc1 -ast-dump-xml myfile.h

Or you can build your own tool using clang reasonably-well-documented parser library, which will build an AST for you, and let you walk it as you see fit (perhaps for output in JSON).

rici
  • 234,347
  • 28
  • 237
  • 341
4

You might start by looking at peg.js which generates javascript code to parse a grammar given as input. Details avalable here https://pegjs.org/

Then yo would need to write or find a grammar for the header files you want to parse.

Marecky
  • 1,924
  • 2
  • 25
  • 39
HBP
  • 15,685
  • 6
  • 28
  • 34
2

Well I'll answer my own question since I found something interesting:

http://www.swig.org/Doc2.0/SWIGDocumentation.html#SWIG_nn2

Swig can output an XML representation of C header files that I could then load from Javascript.

For example:

swig -module yaml -xmlout yaml.xml yaml.h

Generates the following file (snippet below for the yaml_token_delete function):

...

<cdecl id="16015" addr="0x10835d500" >
    <attributelist id="16016" addr="0x10835d500" >
        <attribute name="name" value="yaml_token_delete" id="16017" addr="0x1082b2d00" />
        <attribute name="sym_symtab" value="0x1081007e0" id="16018" addr="0x1081007e0" />
        <attribute name="view" value="globalfunctionHandler" id="16019" addr="0x1082b2d00" />
        <attribute name="kind" value="function" id="16020" addr="0x1082b2d00" />
        <attribute name="sym_name" value="yaml_token_delete" id="16021" addr="0x1082b2d00" />
        <attribute name="wrap_parms" value="0x10835d460" id="16022" addr="0x10835d460" />
        <attribute name="decl" value="f(p.yaml_token_t)." id="16023" addr="0x1082b2d00" />
        <attribute name="tmap_out" value="" id="16024" addr="0x1082b2d00" />
        <parmlist id="16025" addr="0x10835d460" >
            <parm id="16026">
                <attributelist id="16027" addr="0x10835d460" >
                    <attribute name="tmap_typecheck" value="void *vptr = 0;&#10;  int res = SWIG_ConvertPtr($input, &amp;vptr, SWIGTYPE_p_yaml_token_s, 0);&#10;  arg1 = SWIG_CheckState(res);" id="16028" addr="0x1082b2d00" />
                    <attribute name="tmap_typecheck_match_type" value="p.SWIGTYPE" id="16029" addr="0x1082b2d00" />
                    <attribute name="tmap_in_match_type" value="p.SWIGTYPE" id="16030" addr="0x1082b2d00" />
                    <attribute name="tmap_freearg_match_type" value="p.SWIGTYPE" id="16031" addr="0x1082b2d00" />
                    <attribute name="compactdefargs" value="1" id="16032" addr="0x1082b2d00" />
                    <attribute name="name" value="token" id="16033" addr="0x1082b2d00" />
                    <attribute name="emit_input" value="objv[1]" id="16034" addr="0x1082b2d00" />
                    <attribute name="tmap_typecheck_precedence" value="0" id="16035" addr="0x1082b2d00" />
                    <attribute name="tmap_in_numinputs" value="1" id="16036" addr="0x1082b2d00" />
                    <attribute name="tmap_in" value="res1 = SWIG_ConvertPtr(objv[1], &amp;argp1,SWIGTYPE_p_yaml_token_s, 0 |  0 );&#10;  if (!SWIG_IsOK(res1)) { &#10;    SWIG_exception_fail(SWIG_ArgError(res1), &quot;in method '&quot; &quot;$symname&quot; &quot;', argument &quot; &quot;1&quot;&quot; of type '&quot; &quot;yaml_token_t *&quot;&quot;'&quot;); &#10;  }&#10;  arg1 = (yaml_token_t *)(argp1);" id="16037" addr="0x1082b2d00" />

...
Olivier Lalonde
  • 19,423
  • 28
  • 76
  • 91