1

There is this dataframe with a column which is actually a list:

import pandas as pd
df = pd.DataFrame([
    {"a":"a1", "b":"['b11','b12','b13']"},
    {"a":"a2", "b":"['b21','b22','b23']"}
])

which is just:

    a                    b
0  a1  ['b11','b12','b13']
1  a2  ['b21','b22','b23']

how can I have it unfolded like:

    a    b
0  a1  b11
1  a1  b12
2  a1  b13
3  a2  b21
4  a2  b22
5  a2  b23

My first guess was:

from functools import reduce
vls = df.apply(lambda x: [{'a': x['a'], 'b': b} for b in list(eval(x['b']))], axis=1).values
df = pd.DataFrame(reduce(lambda x, y: x + y, vls))

It works, but it takes a huge time for a small set (~ 1000 rows) of my data, and I must apply it to millions of rows.

I wonder if exists a better way using pandas api only.

Thiago Melo
  • 1,157
  • 1
  • 14
  • 31

1 Answers1

1

Try this:

df.groupby('a').apply(lambda df: pd.DataFrame({'a':[df.a.iloc[0]] * len(eval(df.b.iloc[0])),'b': eval(df.b.iloc[0])}))

Instead of using reduce, this uses groupby function to expand the rows - assuming your a column is unique.

Rocky Li
  • 5,641
  • 2
  • 17
  • 33