I have a situation, in which I'm parsing a file and collecting stats. I want to store those stats in nested dict which has a final value as a list. And as I process the file I want to expand the list.
for instance my dict structure is something like this
data_dict
{ "aa1" :
{ 'aa' : []}
{ 'bb' : [] }
"aa2" :
{ 'ab' : []}
{ 'ba' : [] }
}
Now as I parse the file I want to append the value to the last list for instance, in first occurrence of data my dict should look like this.
data_dict
{ "aa1" :
{ 'aa' : ['a0']}
{ 'bb' : ['a1'] }
"aa2" :
{ 'ab' : ['b0']}
{ 'ba' : ['b1'] }
}
and in second something like this
data_dict
{ "aa1" :
{ 'aa' : ['a0', 'a01']}
{ 'bb' : ['a1', 'a11'] }
"aa2" :
{ 'ab' : ['b0', 'b01']}
{ 'ba' : ['b1', 'b11'] }
}
Also I'm not initializing dict keys to anything and creating keys at the first occurrence of the match. Can anyone suggest how do I achieve this?
Note I'm using autovivification for initializing my data_dict, which at first doesn't contain anything.
This is sample data I'm trying to parse
DATETIME TYPE TAG COUNT MEAN 1% 10% 20% 30% 40% 50% 60% 70% 80% 90% 99%
20151109044056 LS_I aa8 57 80,493,122 8,931,000 8,937,000 8,944,000 8,974,000 9,073,000 21,262,000 28,419,000 35,794,000 148,920,000 316,408,000 447,902,000
20151109044056 LS_I aa0 6,893 9,008,024 8,862,000 8,913,000 8,941,000 8,964,000 8,984,000 9,006,000 9,028,000 9,049,000 9,071,000 9,102,000 9,170,000
20151109044056 LS_I aa1 6,062 9,018,094 8,867,000 8,913,000 8,938,000 8,961,000 8,983,000 9,003,000 9,025,000 9,048,000 9,071,000 9,103,000 9,175,000
20151109044056 LS_I aa2 2,776 9,030,621 8,929,000 8,967,000 8,987,000 8,999,000 9,012,000 9,024,000 9,037,000 9,050,000 9,065,000 9,087,000 9,161,000
20151109044056 LS_I aa3 1,074 9,028,744 8,925,000 8,970,000 8,988,000 9,002,000 9,016,000 9,026,000 9,039,000 9,051,000 9,067,000 9,089,000 9,138,000
20151109044056 LS_I aa4 6,060 9,003,651 8,874,000 8,935,000 8,958,000 8,976,000 8,991,000 9,005,000 9,019,000 9,033,000 9,049,000 9,071,000 9,121,000
20151109044056 LS_I aa5 5,453 9,003,993 8,874,000 8,936,000 8,959,000 8,976,000 8,991,000 9,004,000 9,018,000 9,032,000 9,048,000 9,071,000 9,126,000
20151109044056 LS_I aa6 16,384 328 111 165 190 208 227 253 301 362 434 551 997
20151109044056 LS_I aa7 16,384 316 58 65 70 76 87 137 308 395 512 702 1,562
so my dict has first key as Tag column, second key as one of the %column and then the value of this key is all the instances of that value in complete file.
This is my processing code, which is not working.
while re.match("\d{14}\s.*", curr_line):
lat_data = curr_line.split()
tag = lat_data[header.index("TAG")]
for item in range(len(header)):
col = header[item]
if '%' in col or\
"COUNT" in col or\
"MEAN" in col:
self.data_dict[tag][col].append(lat_data[item])
curr_line = lat_file.next()