0

I have a dataframe where one of the columns contains a list. I want to break up these lists so that each element has its own row.

Ex df:

index    Name              Color
  1      Ford    ['Red,Blue' , 'Red,Blue']

result df:

index    Name    Color
  1      Ford    Red
  2      Ford    Blue
  3      Ford    Red
  4      Ford    Blue

The code that I tried:

s = df['Color'].str.split(',').apply(Series,1).stack()
s.index = s.index.droplevel(-1)
s.name = 'Color'
del df['Color']
df = df.join(s)
MaxB
  • 428
  • 1
  • 8
  • 24
  • 1
    Possible duplicate of [pandas: When cell contents are lists, create a row for each element in the list](https://stackoverflow.com/questions/27263805/pandas-when-cell-contents-are-lists-create-a-row-for-each-element-in-the-list) – Cohan Apr 30 '19 at 17:17
  • This is close, gets half way there. Full code below. – MaxB Apr 30 '19 at 17:27

2 Answers2

0

Figured it out, answer below:

s = df.apply(lambda x: pd.Series(x['Color']),axis=1).stack.reset_index(level=1, drop=True)
s.Name = 'Color'
df = df.drop('Color', axis=1).join(s)

s = df['Color'].str.split(',').apply(Series,1).stack()
s.index = s.index.droplevel(-1)
s.name = 'Color'
del df['Color']
df = df.join(s)
MaxB
  • 428
  • 1
  • 8
  • 24
0

Using apply on big data set is really slow. I come up with solution without using apply as follows: set_index on columns index and Name. Next, join and split on Color. Finally, create new datafrom from color list, then stack and reset_index and drop unwanted columns.

Using df as follows:

In [2370]: df
Out[2370]:
   index   Name                       Color
0      1   Ford        [Red,Blue, Red,Blue]
1      1  Chevy  [Yellow,Blue, Yellow,Blue]
2      1  Tesla     [White,Green, Red,Blue]


df.set_index(['index', 'Name'], inplace=True)
color_list = [','.join(st).split(',') for st in df.Color.tolist()]
pd.DataFrame(color_list, index=df.index).stack().reset_index(level=[1, 2]).drop('level_2', 1)

Out[2376]:
        Name       0
index
1       Ford     Red
1       Ford    Blue
1       Ford     Red
1       Ford    Blue
1      Chevy  Yellow
1      Chevy    Blue
1      Chevy  Yellow
1      Chevy    Blue
1      Tesla   White
1      Tesla   Green
1      Tesla     Red
1      Tesla    Blue
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • You need an extra step, since your example `df` does not match OP's. It should be created with something like `df = pd.DataFrame({"Name": ['Ford'], "Color": [["'Red,Blue'" , "'Red,Blue'"]]})` – m13op22 Apr 30 '19 at 20:21
  • @HS-nebula: base on his posted answer and the desired output, the additional single-quotes in the string is a typo. I tested his answer to check runtime with `%timeit` before I posted my answer. His answer provided output match his desired output only when there is no additional single-quotes. – Andy L. Apr 30 '19 at 21:20