This is a follow-up to this question: Python parse text file into nested dictionaries
I initially accepted the answer which suggested to format the input with regexes, but after looking closer at the input, there are still some problems that I cannot can process with the proposed regexes.
So I am back at recursively parsing the lines into a dictionary.
What I have so far is:
def parseToDictionary(input):
key = ''
value = ''
result = {}
if input[0].startswith('{'): # remove {
del input[0]
result.clear() # clear the dict for each recursion
for idx, line in enumerate(input):
line = line.rstrip() # remove trailing returns
if line.startswith('['):
key = line
value = parseToDictionary(input[idx+1:]) # parse the next level
elif line.startswith('}'): # reached the end of a block
return result
else:
elements = line.split('\t')
key = elements[0]
if len(elements) > 1:
value = elements[1]
else:
value = 'Not defined' # some keys may not have a value, so set a generic value here
if key:
result[key] = value
return result
Here is an example (very simplified!) input:
[HEADER1]
{
key1 value
key2 long value, with a comma
[HEADER2]
{
key 1234
emptykey
}
}
The output is:
'[HEADER2]':
{
'emptykey': 'Not defined',
'key': '1234'
},
'key2': 'long value, with a comma',
'key1': 'value',
'[HEADER1]':
{
'emptykey': 'Not defined',
'key2': 'long value, with a comma',
'key1': 'value',
'key': '1234',
'[HEADER2]':
{
'emptykey': 'Not defined',
'key': '1234'
}
},
'emptykey': 'Not defined',
'key': '1234'
}
But it should be:
'[HEADER1]':
{
'key1': 'value',
'key2': 'long value, with a comma',
'[HEADER2]':
{
'emptykey': 'Not defined',
'key': '1234'
}
}
So each line that starts with an [
is the key for the next block. Inside each blocks are multiple key-value pairs, and there could also be another nested level. What goes wrong is that some blocks are parsed multiple times, and I cannot figure out where it goes wrong.
The input parameter is mydatafile.split('\n')
Who can help me out?