How to split DataFrame columns into multiple rows?

Question

I am trying to convert multiple columns to multiple rows. Can someone please offer some advice?

I have DataFrame:

id .        values
1,2,3,4     [('a','b'), ('as','bd'),'|',('ss','dd'), ('ws','ee'),'|',('rr','rt'), ('tt','yy'),'|',('yu','uu'), ('ii','oo')]

I need it to look like this:

ID       Values
1         ('a','b'), ('as','bd')
2         ('ss','dd'), ('ws','ee')
3         ('rr','rt'), ('tt','yy')
4         ('yu','uu'), ('ii','oo')

I have tried groupby, split, izip. Maybe I am not doing it the right way?

could you please add the code you used to create that dataframe so we can help you from there — hysoftwareeng, Oct 21 '19 at 20:10
possible duplicate? https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-entry-to-separate-rows — SRT HellKitty, Oct 21 '19 at 20:25

score 0 · Accepted Answer · answered Oct 21 '19 at 20:37

I made a quick and dirty example how you could parse this dataframe

# example dataframe
df = [
    "1,2,3,4",
    [('a','b'), ('as','bd'), '|', ('ss','dd'), ('ws','ee'), '|', ('rr','rt'), ('tt','yy'), '|', ('yu','uu'), ('ii','oo')]
]

# split ids by comma
ids = df[0].split(",")

# init Id and Items as int and dict()
Id = 0
Items = dict()

# prepare array for data insert
for i in ids:
    Items[i] = []

# insert data
for i in df[1]:
    if isinstance(i, (tuple)):
        Items[ids[Id]].append(i)
    elif isinstance(i, (str)):
        Id += 1

# print data as written in stackoverflow question
print("id .\tvalues")
for item in Items:
    print("{}\t{}".format(item, Items[item]))

Valdi_Bo · Answer 2 · 2019-10-22T17:28:13.680

I came up with a quite concise solution, based on multi-level grouping, which in my opinion is to a great extent pandasonic.

Start from defining the following function, "splitting" a Series taken from individual values element into a sequence of lists representations, without surrounding [ and ]. The splitting occurs at each '|' element.:

def fn(grp1):
    grp2 = (grp1 == '|').cumsum()
    return grp1[grp1 != '|'].groupby(grp2).apply(lambda x: repr(list(x))[1:-1])

(will be used a bit later).

The first step of processing is to convert id column into a Series:

sId = df.id.apply(lambda x: pd.Series(x.split(','))).stack().rename('ID')

For your data the result is:

0  0    1
   1    2
   2    3
   3    4
Name: ID, dtype: object

The first level of MultiIndex is the index of the source row and the second level are consecutive numbers (within the current row).

Now it's time to perform similar conversion of values column:

sVal = pd.DataFrame(df['values'].values.tolist(), index= df.index)\
    .stack().groupby(level=0).apply(fn).rename('Values')

The result is:

0  0      ('a', 'b'), ('as', 'bd')
   1    ('ss', 'dd'), ('ws', 'ee')
   2    ('rr', 'rt'), ('tt', 'yy')
   3    ('yu', 'uu'), ('ii', 'oo')
Name: Values, dtype: object

Note that the MultiIndex above has the same structure as in the case of sId.

And the last step is to concat both these partial results:

result = pd.concat([sId, sVal], axis=1).reset_index(drop=True)

The result is:

  ID                      Values
0  1    ('a', 'b'), ('as', 'bd')
1  2  ('ss', 'dd'), ('ws', 'ee')
2  3  ('rr', 'rt'), ('tt', 'yy')
3  4  ('yu', 'uu'), ('ii', 'oo')

will try this solution now. – riderg28 Oct 23 '19 at 14:59 — riderg28, Oct 23 '19 at 14:59

How to split DataFrame columns into multiple rows?

2 Answers2