0

I have a dataframe, which I need to split a column on character "Y" and keep this deliminator. For example,

    import pandas as pd

    d1 = pd.DataFrame({'user': [1,2,3],'action': ['YNY','NN','NYYN']})

The output dataframe should look like this,

    d2 = pd.DataFrame([{'action': 'Y, NY', 'user': 1},
           {'action': 'NN', 'user': 2},
          {'action': 'NY, Y, N', 'user': 3}])

    in[1]: d1
    out[1]: action  user
            YNY         1
            NN          2
            NYYN        3

    in[2]: d2
    out[2]:  action user
            Y,NY        1
            NN          2
            NY,Y, N     3

I have tried a few questions such as Python split() without removing the delimiter and Python splitting on regex without removing delimiters. But they are not exactly what I am looking for here.

user42361
  • 421
  • 1
  • 5
  • 12
  • Is it ok if the last row reads "N, Y, YN"? If not, would you be okay if the first row was YN, Y? – cs95 Jan 04 '19 at 20:10

2 Answers2

1

Sounds like you need

d1.action.str.split('([^Y]*Y)').map(lambda x : [z for z in x  if z!= ''])
Out[234]: 
0       [Y, NY]
1          [NN]
2    [NY, Y, N]
Name: action, dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
1

Use -

d1['action'].str.split('Y').str.join('Y,').str.rstrip(',')

Output

0      Y,NY
1        NN
2    NY,Y,N
Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42