1

Good afternoon, i am trying to split text in a column to a specfic format here is my table below

UserId  Application
1       Grey Blue::Black Orange;White:Green
2       Yellow Purple::Orange Grey;Blue Pink::Red

I would like it to read the following:

UserId  Application
    1       Grey Blue
    1       White Orange
    2       Yellow Purple
    2       Blue Pink

Basically, i would like to keep the first string of every :: instance for every string in a given cell.

So far my code is

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')

df['Application']=df.Role.str.split(';|::|').map(lambda x : x[0::2])

unnesting(df.drop('Role',1),['Application']

The following code reads

UserId  Application
        1       Grey Blue, White Orange
        2       Yellow Purple, Blue Pink

Please Assist i dont know where i should be using pandas or numpy to solve this problem!!

James Scott
  • 181
  • 1
  • 13

1 Answers1

2

Maybe you can try using extractall

yourdf=df.set_index('UserId').Application.str.extractall(r'(\w+):').reset_index(level=0) 
# You can adding rename(columns={0:'Application'})at the end
Out[87]: 
       UserId       0
match                
0           1    Grey
1           1   White
0           2  Yellow
1           2    Blue

Update look at the unnesting , after we split and select the value we need from the string , we store them into a list , when you have a list type in you columns , I recommend using unnesting

df['LIST']=df.Application.str.split(';|::|:').map(lambda x : x[0::2])

unnesting(df.drop('Application',1),['LIST'])
Out[111]: 
            LIST  UserId
0      Grey Blue       1
0          White       1
1  Yellow Purple       2
1      Blue Pink       2

My own def-function

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')
BENY
  • 317,841
  • 20
  • 164
  • 234
  • thank you so much! this worked however i misread the output i needed, i re-did my question. Intead of just printing one word like grey white etc. i need it to print out the whole string to the left of the :: as an example i need it to read Grey White instead of grey @Wen-Ben – James Scott Mar 09 '19 at 23:17
  • let me know if you have any thoughts! @Wen-Ben – James Scott Mar 09 '19 at 23:40
  • 1
    @JamesScott hi ,check the update , also if you think the answer help you put , you can always accept(check mark at the left) and upvote for us – BENY Mar 10 '19 at 00:14
  • UPDATE - thanks @Wen-ben we are almost there!! however the ending of this doesnt stack it, it makes it into one single row. It needs to be stacked like in my post so i can do some Power BI visulizations with the data. this is perfect just need to figure out how to stack it like before! it comes out like Grey White, White Orange for 0 user id where as i need it to be 0 Grey White in one row and 0 white orange in the other – James Scott Mar 10 '19 at 00:59
  • update - thanks for all your help hoipefully that isnt confusing ! i also updated the post to show you what i am talking about @Wen-Ben – James Scott Mar 10 '19 at 01:00
  • @JamesScott your columns is role or Application ?? – BENY Mar 10 '19 at 01:46
  • for this excercise we can continue with Applicaiton @Wen-Ben – James Scott Mar 10 '19 at 02:41
  • 1
    @JamesScott did you assign it back ? `df=unnesting(df.drop('Application',1)`,['LIST']) – BENY Mar 10 '19 at 03:00
  • yes i did @Wen-Ben its separating the new string like in my udpated question doing Grey Blue, White Orange. Could be a second code just to de-concate that with a spit/stack method of sorts but to do with what your doing i dont htink im doing anything wrong – James Scott Mar 10 '19 at 03:06