2

I have a DataFrame like this:

enter image description here

I want to split the note column and create 3 new columns that consist of the name, country and digit.

The format of the note is the same just that there can be either first name or the full name there.

I was trying to split and start from the right. Take in the digit, then the country and whatever is left should enter the "name" column.

New DataFrame is meant to look like this:

enter image description here

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
Ayomikun Samuel
  • 125
  • 2
  • 9

1 Answers1

3

I believe need Series.str.rsplit with n=2 for spliting by only 2 last whitespaces:

df[['Name','Country','Digit']] = df['Note'].str.rsplit(n=2, expand=True)

equals too:

df[['Name','Country','Digit']] = df['Note'].str.rsplit(' ', n=2, expand=True)    

Proof:

import pandas as pd

df = pd.DataFrame({
    'ID': [1,2,3,4],
    'Note': [
        'Sam John Brazil 2', 
        'Simion Canada 4',
        'Sam John Brazil 1',
        'Henry G. Hilson Spain 3']
})

df[['Name','Country','Digit']] = df['Note'].str.rsplit(n=2, expand=True)

print(df)

Returns:

   ID                     Note             Name Country  Digit
0   1        Sam John Brazil 2         Sam John  Brazil      2
1   2          Simion Canada 4           Simion  Canada      4
2   3        Sam John Brazil 1         Sam John  Brazil      1
3   4  Henry G. Hilson Spain 3  Henry G. Hilson   Spain      3

Add this: (if Digit column is to be numeric)

df['Digit'] = pd.to_numeric(df['Digit'], errors='coerce')

Details:

#print(df['Note'].str.rsplit(n=2, expand=True))

                 0       1  2
0         Sam John  Brazil  2
1           Simion  Canada  4
2         Sam John  Brazil  1
3  Henry G. Hilson   Spain  3
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Smart. Probably a: `df['Digit'] = df['Digit'].astype(int)` after too. – Anton vBR Sep 19 '18 at 13:12
  • Hello jezrael! I'm newbie on python and I was there just to learn something. Can you explain your answer please? thanks in advance – Simo Sep 19 '18 at 13:13
  • 2
    @Simo - Sure, but first need create sample data :( – jezrael Sep 19 '18 at 13:13
  • 1
    @JohnZwinck That is only true if you assign a scalar to multiple cols. – Anton vBR Sep 19 '18 at 13:21
  • @Simo do you understand it now? – Anton vBR Sep 19 '18 at 13:26
  • If the state is state is made by 2 words(eg. `United Kingdom`) doesnt work properly :( – Joe Sep 19 '18 at 13:33
  • @Joe - Yes, you are right - possible solution is 2 repalcements like `countries2 = ['United Kingdom'] d = {x:x.replace(' ','_') for x in countries2} d1 = {v:k for k, v in d.items()} df[['Name','Country','Digit']] = df['Note'].replace(d, regex=True).str.rsplit(n=2, expand=True) df['Country'] = df['Country'].replace(d1, regex=True)` – jezrael Sep 19 '18 at 14:02