Edited
Sorry for the misunderstanding of the question. I think the conversion of the zip file can be done as follows. I assume that the 1st and last positions of the file are {
and }
respectively without any newlines symbols.
data = eval(f"[{file.replace('\n', ',')}")
Also being inspired by this post I can offer a solution using pure pandas. If we import the data into a pandas DataFrame, by passing to the constructor a list of the dictionaries above, the problem becomes in how to explode
a list within a DataFrame's cell.
This is done by the combination of .apply(pd.Series)
that creates multiple columns for the different elements within the list and pd.melt
that transforms those columns back into a single column. By previously setting as index the other columns we can save them for the resulting DataFrame.
Here the code:
# Formatting the data to be introduced in the pd.DataFrame
data = [{"id":"1","f":"A","data":[["2040",0],["2039",0],["2038",0],["2037",0],["2036",0]]},
{"id":"2","f":"A","data":[["2040",0],["2039",0],["2038",0],["2037",0],["2036",0]]},
{"id":"3","f":"A","data":[["2040",0],["2039",0],["2038",0],["2037",0],["2036",0]]},
{"id":"4","f":"A","data":[["2040",0],["2039",0],["2038",0],["2037",0],["2036",0]]},
{"id":"5","f":"A","data":[["2040",0],["2039",0],["2038",0],["2037",0],["2036", 0]]}
]
# And the piece of code
(pd.melt((pd.DataFrame(data)
.assign(data=lambda x: [[l[0] for l in ls] for ls in x.data])
.set_index(['id', 'f'])
.data.apply(pd.Series)
.reset_index()
),
id_vars=['id', 'f'],
value_name='data'
)
.set_index(['id', 'f'])
.drop('variable', axis=1)
.dropna()
.sort_index()
.reset_index()
)
id f data
0 1 A 2040
1 1 A 2039
2 1 A 2038
3 1 A 2037
4 1 A 2036
5 2 A 2040
6 2 A 2039
7 2 A 2038
8 2 A 2037
9 2 A 2036
10 3 A 2040
11 3 A 2039
12 3 A 2038
13 3 A 2037
14 3 A 2036
15 4 A 2040
16 4 A 2039
17 4 A 2038
18 4 A 2037
19 4 A 2036
20 5 A 2040
21 5 A 2039
22 5 A 2038
23 5 A 2037
24 5 A 2036