3

I need to parse a custom file format with C#. The file format is a PBX file of Xcode project. There is no official documentation on the format. But it's rather straightforward. Here is the simple example:

// !$*UTF8*$!
{
    archiveVersion = 1;
    classes = {
    };
    objectVersion = 46;
    objects = {

        /* Begin PBXBuildFile section */
        5143B90C1884374800F27FD8 /* Foundation.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 5143B90B1884374800F27FD8 /* Foundation.framework */; };
        5143B90E1884374800F27FD8 /* CoreGraphics.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 5143B90D1884374800F27FD8 /* CoreGraphics.framework */; };
        5143B9101884374800F27FD8 /* UIKit.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 5143B90F1884374800F27FD8 /* UIKit.framework */; };
        /* End PBXBuildFile section */
    };
    rootObject = 5143B9001884374800F27FD8 /* Project object */;
}

In objects section there is a sequence of object definitions: object unique id followed by its properties. You can see comments here. Also property values can be enclosed in quotes.

The complete example of PBX file is here.

Now I need to build DOM of the file. What is the best approach to solve this kind of tasks?

alexey
  • 8,360
  • 14
  • 70
  • 102
  • 1
    You could certainly do this, with a series of Regex's – Dave Bish Jan 13 '14 at 16:24
  • 1
    This would be a good start: http://stackoverflow.com/questions/7557273/tutorial-or-guide-for-scripting-xcode-build-phases Of course, it depends on your requirements. What is the tool going to be for? Do you expect that the format is very fixed (ie. can you always assume that the line `objects = {` will not be written with `{` on the next line etc.)? If it's just some internal tool, you could probably get away with simply reading line by line and parsing in a simple stupid way (like `if (line.IndexOf("objects") != -1) ...`). – Luaan Jan 13 '14 at 16:26
  • 1
    Define best. You could do a grammar and parse it. Or you could loop through line by line and chop strings, and test for particular strings... – Tony Hopkinson Jan 13 '14 at 16:26
  • It shouldn't rely on a specific positions of spaces and line breaks. It's a grammar. And I'm looking for a guide to parse it. – alexey Jan 13 '14 at 16:53
  • It would be also cool if the parser can be used in a way XmlReader in .net works: reading token by token. – alexey Jan 13 '14 at 16:56

4 Answers4

1

Using parser (because of nested braces regex is no-go). Pick the one you feel OK with syntax:

I guess you are new to this, so this is why I grouped those -- top down approach, bottom up and combinator one. My personal preference is bottom up, the definition of mathematical expressions feels more natural for me, but here you should not have that kind of problem.

Starting 2014-01-28 NLT includes PBXProj files simple reader.

greenoldman
  • 16,895
  • 26
  • 119
  • 185
0

I've found that Sprache project is really good for this type of grammars.

For simple parsing cases Regex's can be enough too.

alexey
  • 8,360
  • 14
  • 70
  • 102
0

I use the Regex classes when they are suitable, but for more structured data like you've shown here I would turn to ANTLR as documented here for C#.

Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
0

If you need to be able to match nested braces, regexes will not work. You could use a parser generator like ANTLR, but this format looks simple enough to write your own recursive descent parser.

Before we could show you how to write the parser we would need to know what kind of DOM you want to output.

Dour High Arch
  • 21,513
  • 29
  • 75
  • 90
  • Why write a recursive descent parser by hand, when ANTLR provides similar (or better) performance with a much higher degree of maintainability? – Sam Harwell Jan 14 '14 at 05:12