2

Below is my script for a generic data frame in Python using pandas. I am hoping to split a certain column in the data frame that will create new columns, while respecting the original orientation of the items in the original column.

Please see below for my clarity. Thank you in advance!

My script:

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': ['x,y,z', 'a,b', 'c']})
print(df)

Here's what I want

df = pd.DataFrame({'col1': ['x',np.nan,np.nan],
                    'col2': ['y','a',np.nan],
                    'col3': ['z','b','c']})
print(df)

Here's what I get

df = pd.DataFrame({'col1': ['x','a','c'],
                    'col2': ['y','b',np.nan],
                    'col3': ['z',np.nan,np.nan]})
print(df)
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Tai
  • 21
  • 2
  • related to [this question](https://stackoverflow.com/questions/60896456/sort-arrangement-of-pandas-column-values#comment107740120_60896456). – Quang Hoang Mar 29 '20 at 14:02
  • Thank you! Although this reveres the relative position, I'm able to select out which columns I want in my new data frame. – Tai Mar 29 '20 at 14:11

3 Answers3

2

You can use the justify function from this answer with Series.str.split:

dfn = pd.DataFrame(
    justify(df['col1'].str.split(',', expand=True).to_numpy(), 
            invalid_val=None, 
            axis=1, 
            side='right')
).add_prefix('col')

   col0  col1 col2
0     x     y    z
1  None     a    b
2  None  None    c
Erfan
  • 40,971
  • 8
  • 66
  • 78
1

Here is a way of tweaking the split:

max_delim = df['col1'].str.count(',').max() #count the max occurance of `,`
delim_to_add = max_delim - df['col1'].str.count(',') #get difference of count from max
# multiply the delimiter and add it to series, followed by split
df[['col1','col2','col3']] = (df['col1'].radd([','*i for i in delim_to_add])
                              .str.split(',',expand=True).replace('',np.nan))
print(df)

  col1 col2 col3
0    x    y    z
1  NaN    a    b
2  NaN  NaN    c
anky
  • 74,114
  • 11
  • 41
  • 70
1

Try something like

s=df.col1.str.count(',')
#(s.max()-s).map(lambda x : x*',')
#0      
#1     ,
#2    ,,
Name: col1, dtype: object
(s.max()-s).map(lambda x : x*',').add(df.col1).str.split(',',expand=True)
   0  1  2
0  x  y  z
1     a  b
2        c
BENY
  • 317,841
  • 20
  • 164
  • 234