I have a pandas Data Frame having one column containing arrays. I'd like to "flatten" it by repeating the values of the other columns for each element of the arrays.
I succeed to make it by building a temporary list of values by iterating over every row, but it's using "pure python" and is slow.
Is there a way to do this in pandas/numpy? In other words, I try to improve the flatten function in the example below.
Thanks a lot.
toConvert = pd.DataFrame({
'x': [1, 2],
'y': [10, 20],
'z': [(101, 102, 103), (201, 202)]
})
def flatten(df):
tmp = []
def backend(r):
x = r['x']
y = r['y']
zz = r['z']
for z in zz:
tmp.append({'x': x, 'y': y, 'z': z})
df.apply(backend, axis=1)
return pd.DataFrame(tmp)
print(flatten(toConvert).to_string(index=False))
Which gives:
x y z
1 10 101
1 10 102
1 10 103
2 20 201
2 20 202