1

When using the zmdp solver from here i came across a funky file format that I haven't seen before, it uses => for assignment. I wasn't able to find out what format it was from the package documentation (it says it is a "policy" format, but it must be based on something more generic)

{
  policyType => "MaxPlanesLowerBound",
  numPlanes => 7,
  planes => [
    {
      action => 2,
      numEntries => 3,
      entries => [
        0, 18.7429,
        1, 18.7426,
        2, 21.743
      ]
    },
    ### more entries ###
    {
      action => 3,
      numEntries => 3,
      entries => [
        0, 20.8262,
        1, 20.8261,
        2, 20.8259
      ]
    }
  ]
}

I researched a lot on what would be a straightforward way to parse such files (in Python), and also read this blog post which has a huge variety of options for lexing and parsing (the tools that looked most promising for my example seemed to be parsimonious and parsy).
However, whatever solutions I can think of just feels like I'm re-inventing the wheel, and lexing and parsing seems to be an overkill for what I'm trying to do.
I also found this stackoverflow question which coincidentally seems to also be related to a format that uses =>. However, being lazy and minimalistic when it comes to code, I don't like the regex solution too much. My gut feeling tells me that there must be a 3-4 line solution to write the input file to a python dict or similarly useful format. In particular, I suspect that this is already standard syntax of some format I just am not aware of (it's obviously not csv, json, yaml or xml)

The question therefore is: Is the above a standard file format, and if yes, what is it?
If not, how do I parse this file elegantly and compactly in Python3, i.e. without regexing for every keyword?

sophros
  • 14,672
  • 11
  • 46
  • 75
mattu
  • 944
  • 11
  • 24
  • Looking at the [source](https://github.com/trey0/zmdp/blob/55174941c85eb04ab2f6ea13d7f3fd72e8530d54/src/pomdpBounds/MaxPlanesLowerBound.cc#L544), especially the [parsing code](https://github.com/trey0/zmdp/blob/55174941c85eb04ab2f6ea13d7f3fd72e8530d54/src/pomdpBounds/MaxPlanesLowerBound.cc#L598), it looks very ad hoc and not the "JSON with =>" you might guess from looking at it. (Also, the code writes a big JSON-incompatible comment section at the top that you probably stripped out.) – user2357112 Nov 25 '18 at 02:16
  • You can probably *read* the format by stripping out the header, replacing `=>` with `:`, and stuffing the result into a JSON parser, but *writing* the format by dumping JSON and replacing `:` with `=>` won't work. – user2357112 Nov 25 '18 at 02:19
  • 1
    (Also, unquoted keys are not standard JSON, and Python's `json` module doesn't accept them.) – user2357112 Nov 25 '18 at 02:27
  • @user2357112 true about the unquoted keys. In terms of writing, at the moment i don't have that requirement but it's a good point. – mattu Nov 25 '18 at 02:57
  • for stripping out the comment section, i used `policy_str = re.sub(r"[#].*\n?", "", policy_str)` – mattu Nov 25 '18 at 03:24

1 Answers1

1

I don’t see any differences from json here aside from replacing ‘=>’ with ‘:’ and adding a top level key.

filestr.replace(‘=>’, ‘:’)
dictionary = json.loads(filestr)

Edited after seeing comment above.

Unquoted keys are indeed not part of the json standard. To address that, you can use a library as described here or you can regex it.

John R
  • 1,505
  • 10
  • 18
  • Exactly what i meant by elegant and compact! Great, thank you! – mattu Nov 25 '18 at 02:11
  • it actually required a little more work as pointed out by @user2357112: I had to add quotation marks around the keys, so I ended up using `re.sub(r"\S+ =>", lambda m: "\"" + m.group(0).replace(" =>", "\":"), policy_str)` followedy by json.loads(policy_str)` – mattu Nov 25 '18 at 03:21
  • but overall still pretty compact and less work than writing an entire parser. – mattu Nov 25 '18 at 03:23