3

I have a dataset that follows this format:

data =[[[1, 0, 1000], [2, 1000, 2000]],
        [[1, 0, 1500], [2, 1500, 2500], [2, 2500, 4000]]]
var1 = [10.0, 20.0]
var2 = ['ref1','ref2']

I want to convert it to a dataframe:

dic = {'var1': var1, 'var2': var2, 'data': data}

import Pandas as pd
pd.DataFrame(dic)

The result:

Dataframe initial

However I'm trying to get something like this:

enter image description here

I've been trying to flatten the dictionary/list but with no success:

pd.DataFrame([[col1, col2] for col1, d in dic.items() for col2 in d])

See the result:

try again

The different sizes of the list made the 'unpacking' complicated for another level. I'm not sure if pandas could take care of this of it needs to be done before importing into pandas.

Yoann_R
  • 99
  • 7

2 Answers2

2

Creating an appropriate list works:

new_data = []
for x, v1, v2 in zip(data, var1, var2):
    new_data.extend([y + [v1] + [v2] for y in x])
pd.DataFrame(new_data, columns=['data', 'min', 'max', 'var1', 'var2'])

gives:

   data   min   max  var1  var2
0     1     0  1000    10  ref1
1     2  1000  2000    10  ref1
2     1     0  1500    20  ref2
3     2  1500  2500    20  ref2
4     2  2500  4000    20  ref2
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
1

I can iterate over the rows in your temporary DataFrame.

df = pd.DataFrame(dic)
result = []
for i,d in df.iterrows():
    temp = pd.DataFrame(d['data'], columns=['data', 'min', 'max'])
    temp['var1'] = d['var1']
    temp['var2'] = d['var2']
    result += [temp]
pd.concat(result)

This produces

   data   min   max  var1  var2
0     1     0  1000    10  ref1
1     2  1000  2000    10  ref1
0     1     0  1500    20  ref2
1     2  1500  2500    20  ref2
2     2  2500  4000    20  ref2
chrisaycock
  • 36,470
  • 14
  • 88
  • 125
  • The picture for the desired result in the question shows indices `0, 1, 2, 3, 4,`. You have `0, 1, 0, 1, 2` as indices. – Mike Müller May 19 '15 at 20:37
  • 1
    For my use the indices are not essential, in my dataset I will pass a specific column as index. – Yoann_R May 19 '15 at 21:06