3

I am trying to write a specific xml parser for some kind of API and I was wondering if I can get it working without the existing xml parsers like xmerl. How feasible would be to implement it using only bit syntax and is there any online doc where shows how you can get started on parsing xml this way?

hyperboreean
  • 8,273
  • 12
  • 61
  • 97

1 Answers1

7

It is not feasible, XML parsers are available for a reason, it if was feasible then the dedicated parsers would not exist. Bit syntax is only good for when the order of the bits/bytes is fixed. XML does not mandate order of attributes, and most people don't realize that the XML spec is does not mandate order of sibling elements either. So trying to match anything with the bit syntax would not work with all possible inputs of XML just with the unordered nature of attributes, much less unordered sibling elements. Just use an XML parser, this isn't a hill you want to die on.

  • 1
    @hyperboreean: Agreeing to all above. Use erlang pattern matching for helping to traversing the parse after e.g. xmerl_scan:file/2 ... – Peer Stritzinger Feb 02 '11 at 16:44
  • Ok, the answer above makes total sense, I don't know where I was thinking. I guess I want to have something very specialized that parses the expected protocol very fast and that's why I though of bit syntax. – hyperboreean Feb 02 '11 at 19:49
  • @Peer Stritzinger: Will do that, do you have any benchmarks on xmerl_scan:file/2? I am looking for something really fast – hyperboreean Feb 02 '11 at 19:50
  • @hyperboreean: sorry no benchmarks here, but results would be dependent on the data anyway. Put some of your XML in a file and just call xmerl_scan:file on the erlang shell and benchmark this. This won't take much time and you get meaningfull results right away. – Peer Stritzinger Feb 03 '11 at 10:37
  • @hyperboreean very fast and XML are almost mutually exclusive at a macro level for performance, compared to a specialized bit packed protocol, and even then you get into a time/space balancing issue. tightly packed protocols can cost more time to parse than more loosely packed, so there XML and performance are relative only to each other based on the grammar of the XML stanzas and the parser and what you are doing with it. –  Nov 01 '11 at 18:00