Python dataframe separate cell values containing lists

Question

I have a dataframe df:

        0               1               2   
Mon ['x','y','z']   ['a','b','c']   ['a','b','c']
Tue ['a','b','c']   ['a','b','c']   ['x','y','z']
Wed ['a','b','c']   ['a','b','c']   ['a','b','c']

Lists are all of diff from each other (Maybe similar too) and I wish to convert it to the form:

    0 1 2
Mon x a a
Mon y b b
Mon z c c
Tue a a x
Tue b b y
Tue c c z
Wed a a a
Wed b b b
Wed c c c

Referring to some previous SO questions, Explode lists with different lengths in Pandas, Split (explode) pandas dataframe string entry to separate rows

I have tried to use their solutions but I am unable to get the desired output. How can I achieve this?

s1 = df[0]
s2 = df[1]
s3 = df[2]
i1 = np.arange(len(df)).repeat(s1.str.len())
i2 = np.arange(len(df)).repeat(s2.str.len())
i3 = np.arange(len(df)).repeat(s3.str.len())
df.iloc[i1, :-1].assign(**{'Shared Codes': np.concatenate(s1.values)})
df.iloc[i2, :-1].assign(**{'Shared Codes': np.concatenate(s2.values)})
df.iloc[i3, :-1].assign(**{'Shared Codes': np.concatenate(s3.values)})

Also, this doesn't seem like a very reasonable way to do it, provided I have even more columns. Using python 2.7.

score 3 · Accepted Answer · answered Apr 14 '18 at 11:43

This is one way using itertools.chain and numpy.repeat:

import pandas as pd, numpy as np
from itertools import chain

df = pd.DataFrame({0: [['x', 'y', 'z'], ['a', 'b', 'c'], ['a', 'b', 'c']],
                   1: [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']],
                   2: [['a', 'b', 'c'], ['x', 'y', 'z'], ['a', 'b', 'c']]},
                  index=['Mon', 'Tue', 'Wed'])

res = pd.DataFrame({k: list(chain.from_iterable(df[k])) for k in df},
                   index=np.repeat(df.index, list(map(len, df[0]))))

print(res)

#      0  1  2
# Mon  x  a  a
# Mon  y  b  b
# Mon  z  c  c
# Tue  a  a  x
# Tue  b  b  y
# Tue  c  c  z
# Wed  a  a  a
# Wed  b  b  b
# Wed  c  c  c

score 1 · Answer 2 · answered Apr 14 '18 at 11:55

1

I'd do it this way:

dfs = []
for day in df.index:
    part = pd.DataFrame(df.loc[day].tolist()).T
    part.index = np.repeat(day, len(df.columns))
    dfs.append(part)
result = pd.concat(dfs)

answered Apr 14 '18 at 11:55

John Zwinck

239,568
38
324
436

score 0 · Answer 3 · answered Apr 14 '18 at 11:48

0

A simple iteration might help if the columns contain list made up of 3 elements each i.e :

ndf = pd.concat([df.apply(lambda x : [i[j] for i in x],1) for j in range(3)]).sort_index()

     0  1  2
Mon  x  a  a
Mon  y  b  b
Mon  z  c  c
Tue  a  a  x
Tue  b  b  y
Tue  c  c  z
Wed  a  a  a
Wed  b  b  b
Wed  c  c  c

answered Apr 14 '18 at 11:48

Bharath M Shetty

30,075
6
57
108

this solution messes up with the order of the dataframe and lists. – inquisitiveProgrammer Apr 14 '18 at 12:01
When a dataframe, df is passed to your one-liner code. 1. The order of index changes, 2. The order of cell values, that are lists in this case, when seperated also changes. Hence it's not the most effective solution, as maintaining the order of list is important. – inquisitiveProgrammer Apr 14 '18 at 12:29

Python dataframe separate cell values containing lists

3 Answers3