2

This is a follow-up to this question: Python parse text file into nested dictionaries

I initially accepted the answer which suggested to format the input with regexes, but after looking closer at the input, there are still some problems that I cannot can process with the proposed regexes.

So I am back at recursively parsing the lines into a dictionary.

What I have so far is:

def parseToDictionary(input):
    key = ''
    value = ''
    result = {}

    if input[0].startswith('{'): # remove {
        del input[0]

    result.clear() # clear the dict for each recursion

    for idx, line in enumerate(input):
        line = line.rstrip() # remove trailing returns

        if line.startswith('['):
            key = line
            value = parseToDictionary(input[idx+1:]) # parse the next level
        elif line.startswith('}'): # reached the end of a block
            return result
        else:
            elements = line.split('\t')
            key = elements[0]
            if len(elements) > 1:
                value = elements[1]
            else:
                value = 'Not defined' # some keys may not have a value, so set a generic value here
        if key:
            result[key] = value

    return result

Here is an example (very simplified!) input:

[HEADER1]
{
key1    value
key2    long value, with a comma
[HEADER2]
{
key 1234
emptykey
}
}

The output is:

'[HEADER2]': 
{
    'emptykey': 'Not defined', 
    'key': '1234'
}, 
'key2': 'long value, with a comma', 
'key1': 'value', 
'[HEADER1]': 
{
    'emptykey': 'Not defined', 
    'key2': 'long value, with a comma', 
    'key1': 'value', 
    'key': '1234', 
    '[HEADER2]': 
    {
        'emptykey': 'Not defined', 
        'key': '1234'
    }
 }, 
 'emptykey': 'Not defined', 
 'key': '1234'
 }

But it should be:

'[HEADER1]': 
{
    'key1': 'value', 
    'key2': 'long value, with a comma', 
    '[HEADER2]': 
    {
        'emptykey': 'Not defined', 
        'key': '1234'
    }
 }

So each line that starts with an [ is the key for the next block. Inside each blocks are multiple key-value pairs, and there could also be another nested level. What goes wrong is that some blocks are parsed multiple times, and I cannot figure out where it goes wrong.

The input parameter is mydatafile.split('\n')

Who can help me out?

koen
  • 5,383
  • 7
  • 50
  • 89

1 Answers1

2

You have to skip the lines, that are processsed in the subsections:

def parse_to_dictionary(lines):
    def parse_block(lines):
        contents = {}
        if next(lines).strip() != '{':
            raise AssertionError("'{' expected")
        for line in lines:
            line = line.strip()
            if line == '}':
                return contents
            elif line[0] == '[':
                contents[line] = parse_block(lines)
            else:
                parts = line.split('\t', 1)
                contents[parts[0]] = None if len(parts) == 1 else parts[1]

    lines = iter(lines)
    key = next(lines)                
    if key[0] != '[':
        raise AssertionError("format error")
    return {key: parse_block(lines)}
Daniel
  • 42,087
  • 4
  • 55
  • 81
  • Since the first line is not `{`, i will fail immediately. What I really try to do is just to skip the lines with `{`. – koen Oct 28 '17 at 21:14
  • So to fix that, I commented out the `raise AssertionError("'{' expected")` line and indented the for-loop. Works perfect now. – koen Oct 28 '17 at 21:49
  • Disregard these comments, there was an error in my input file. – koen Oct 29 '17 at 04:27