16

I have a pandas dataframe that looks like this:

COL     data
line1   [A,B,C]

where the items in the data column could either be a list or just comma separated elements. Is there an easy of way of getting:

COL     data
line1   A
line1   B
line1   C

I could iterate over the list and manually duplicate the rows via python, but is there some magic pandas trick for doing this? The key point is how to automatically duplicate the rows.

Thanks!

vgoklani
  • 10,685
  • 16
  • 63
  • 101

2 Answers2

8

You could write a simple cleaning function to make it a list (assuming it's not a list of commas, and you can't simply use ast.literal_eval):

def clean_string_to_list(s):
    return [c for c in s if c not in '[,]']  # you might need to catch errors

df['data'] = df['data'].apply(clean_string_to_list)

Iterating through the rows seems like a reasonable choice:

In [11]: pd.DataFrame([(row['COL'], d)
                       for d in row['data']
                       for _, row in df.iterrows()],
                       columns=df.columns)
Out[11]:
     COL data
0  line1    A
1  line1    B
2  line1    C

I'm afraid I don't think pandas caters specifically for this kind of manipulation.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
3

You can use df.explode() option. Refer to the documentation. I believe this is exactly the functionality you need.

Timus
  • 10,974
  • 5
  • 14
  • 28