0

I basically have a file with this structure:

root \
{
  field1 {
    subfield_a {
      "value1"
    }
    subfield_b {
      "value2"
    }
    subfield_c {
      "value1"
      "value2"
      "value3"
    }
    subfield_d {
    }
  }
  field2 {
    subfield_a {
      "value1"
    }
    subfield_b {
      "value1"
    }
    subfield_c {
      "value1"
      "value2"
      "value3"
      "value4"
      "value5"
    }
    subfield_d {
    }
  }
}

I want to parse this file with python to get a multidimensional array that contains all the values of a specific subfield (for examples subfield_c). E.g. :

tmp = magic_parse_function("subfield_c",file)
print tmp[0] # [ "value1", "value2", "value3"]
print tmp[1] # [ "value1", "value2", "value3", "value4", "value5"]

I'm pretty sure I've to use the pyparsing class, but I don't where to start to set the regex (?) expression. Can someone give me some pointers ?

haster8558
  • 423
  • 6
  • 15
  • If your input as as simple as the example you posted, you don't even need pyparsing, and you can try to write your own tokenizer that manages a stack to know its depth. [Here](http://stackoverflow.com/a/4285211/1011859) someones does it with parentheses, and no contents. Do you feel like you can try to adapt this ? If not, I can try to give some more pointers. (BTW: regular expressions can't count, so be careful when trying to use them for this kind of task) – pistache Aug 03 '16 at 10:29
  • How exactly are you modifying strings in Python, I'm curious ? :) – pistache Aug 03 '16 at 10:47
  • 1
    Basically I've deleted the \n, replace the curly brackets with normal brackets and I've deleted the "\t". Then I'm trying to figure out how I can extract only what I need, but it's not a big deal. The hard part was to have an array with the right informations. – haster8558 Aug 03 '16 at 10:50
  • If you find a working solution, it would be cool to post it as an answer to your own question :) – pistache Aug 03 '16 at 10:56
  • yep, i'm working on it, I'm trying to write the "magic_parse_function". As soon as I've finished I'll post the solution. The problem is I would like to have only a specific depth, but the function return everything. – haster8558 Aug 03 '16 at 11:07

1 Answers1

1

You can let pyparsing take care of the matching and iterating over the input, just define what you want it to match, and pass it the body of the file as a string:

def magic_parse_function(fld_name, source):
    from pyparsing import Keyword, nestedExpr

    # define parser
    parser = Keyword(fld_name).suppress() + nestedExpr('{','}')("content")

    # search input string for matching keyword and following braced content
    matches = parser.searchString(source)

    # remove quotation marks
    return [[qs.strip('"') for qs in r[0].asList()] for r in matches]

# read content of file into a string 'file_body' and pass it to the function
tmp = magic_parse_function("subfield_c",file_body)

print(tmp[0])
print(tmp[1])

prints:

['value1', 'value2', 'value3']
['value1', 'value2', 'value3', 'value4', 'value5']
PaulMcG
  • 62,419
  • 16
  • 94
  • 130