5

I have a dataframe df:

        0               1               2   
Mon ['x','y','z']   ['a','b','c']   ['a','b','c']
Tue ['a','b','c']   ['a','b','c']   ['x','y','z']
Wed ['a','b','c']   ['a','b','c']   ['a','b','c']

Lists are all of diff from each other (Maybe similar too) and I wish to convert it to the form:

    0 1 2
Mon x a a
Mon y b b
Mon z c c
Tue a a x
Tue b b y
Tue c c z
Wed a a a
Wed b b b
Wed c c c

Referring to some previous SO questions, Explode lists with different lengths in Pandas, Split (explode) pandas dataframe string entry to separate rows

I have tried to use their solutions but I am unable to get the desired output. How can I achieve this?

s1 = df[0]
s2 = df[1]
s3 = df[2]
i1 = np.arange(len(df)).repeat(s1.str.len())
i2 = np.arange(len(df)).repeat(s2.str.len())
i3 = np.arange(len(df)).repeat(s3.str.len())
df.iloc[i1, :-1].assign(**{'Shared Codes': np.concatenate(s1.values)})
df.iloc[i2, :-1].assign(**{'Shared Codes': np.concatenate(s2.values)})
df.iloc[i3, :-1].assign(**{'Shared Codes': np.concatenate(s3.values)})

Also, this doesn't seem like a very reasonable way to do it, provided I have even more columns. Using python 2.7.

Cœur
  • 37,241
  • 25
  • 195
  • 267

3 Answers3

3

This is one way using itertools.chain and numpy.repeat:

import pandas as pd, numpy as np
from itertools import chain

df = pd.DataFrame({0: [['x', 'y', 'z'], ['a', 'b', 'c'], ['a', 'b', 'c']],
                   1: [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']],
                   2: [['a', 'b', 'c'], ['x', 'y', 'z'], ['a', 'b', 'c']]},
                  index=['Mon', 'Tue', 'Wed'])

res = pd.DataFrame({k: list(chain.from_iterable(df[k])) for k in df},
                   index=np.repeat(df.index, list(map(len, df[0]))))

print(res)

#      0  1  2
# Mon  x  a  a
# Mon  y  b  b
# Mon  z  c  c
# Tue  a  a  x
# Tue  b  b  y
# Tue  c  c  z
# Wed  a  a  a
# Wed  b  b  b
# Wed  c  c  c
jpp
  • 159,742
  • 34
  • 281
  • 339
1

I'd do it this way:

dfs = []
for day in df.index:
    part = pd.DataFrame(df.loc[day].tolist()).T
    part.index = np.repeat(day, len(df.columns))
    dfs.append(part)
result = pd.concat(dfs)
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
0

A simple iteration might help if the columns contain list made up of 3 elements each i.e :

ndf = pd.concat([df.apply(lambda x : [i[j] for i in x],1) for j in range(3)]).sort_index()

     0  1  2
Mon  x  a  a
Mon  y  b  b
Mon  z  c  c
Tue  a  a  x
Tue  b  b  y
Tue  c  c  z
Wed  a  a  a
Wed  b  b  b
Wed  c  c  c
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
  • this solution messes up with the order of the dataframe and lists. – inquisitiveProgrammer Apr 14 '18 at 12:01
  • When a dataframe, df is passed to your one-liner code. 1. The order of index changes, 2. The order of cell values, that are lists in this case, when seperated also changes. Hence it's not the most effective solution, as maintaining the order of list is important. – inquisitiveProgrammer Apr 14 '18 at 12:29