How to split a column by a delimiter, while respecting the relative position of items to be separated

Question

Below is my script for a generic data frame in Python using pandas. I am hoping to split a certain column in the data frame that will create new columns, while respecting the original orientation of the items in the original column.

Please see below for my clarity. Thank you in advance!

My script:

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': ['x,y,z', 'a,b', 'c']})
print(df)

Here's what I want

df = pd.DataFrame({'col1': ['x',np.nan,np.nan],
                    'col2': ['y','a',np.nan],
                    'col3': ['z','b','c']})
print(df)

Here's what I get

df = pd.DataFrame({'col1': ['x','a','c'],
                    'col2': ['y','b',np.nan],
                    'col3': ['z',np.nan,np.nan]})
print(df)

related to [this question](https://stackoverflow.com/questions/60896456/sort-arrangement-of-pandas-column-values#comment107740120_60896456). — Quang Hoang, Mar 29 '20 at 14:02
Thank you! Although this reveres the relative position, I'm able to select out which columns I want in my new data frame. — Tai, Mar 29 '20 at 14:11

score 2 · Answer 1 · answered Mar 29 '20 at 14:05

You can use the justify function from this answer with Series.str.split:

dfn = pd.DataFrame(
    justify(df['col1'].str.split(',', expand=True).to_numpy(), 
            invalid_val=None, 
            axis=1, 
            side='right')
).add_prefix('col')

   col0  col1 col2
0     x     y    z
1  None     a    b
2  None  None    c

score 1 · Answer 2 · answered Mar 29 '20 at 14:04

Here is a way of tweaking the split:

max_delim = df['col1'].str.count(',').max() #count the max occurance of `,`
delim_to_add = max_delim - df['col1'].str.count(',') #get difference of count from max
# multiply the delimiter and add it to series, followed by split
df[['col1','col2','col3']] = (df['col1'].radd([','*i for i in delim_to_add])
                              .str.split(',',expand=True).replace('',np.nan))
print(df)

  col1 col2 col3
0    x    y    z
1  NaN    a    b
2  NaN  NaN    c

score 1 · Answer 3 · answered Mar 29 '20 at 14:18

1

Try something like

s=df.col1.str.count(',')
#(s.max()-s).map(lambda x : x*',')
#0      
#1     ,
#2    ,,
Name: col1, dtype: object
(s.max()-s).map(lambda x : x*',').add(df.col1).str.split(',',expand=True)
   0  1  2
0  x  y  z
1     a  b
2        c

answered Mar 29 '20 at 14:18

BENY

317,841
20
164
234

How to split a column by a delimiter, while respecting the relative position of items to be separated

3 Answers3