How to create new columns with custom split on python

Question

I have a DataFrame like this:

I want to split the note column and create 3 new columns that consist of the name, country and digit.

The format of the note is the same just that there can be either first name or the full name there.

I was trying to split and start from the right. Take in the digit, then the country and whatever is left should enter the "name" column.

New DataFrame is meant to look like this:

Please provide us with a [complete and verifiable](https://www.stackoverflow.com/help/mcve) example of what you tried — Simo, Sep 19 '18 at 13:10
@RahulAgarwal That is not the use-case here. We don't want them reversed. But... I have seen similar answers before to this question on SO. — Anton vBR, Sep 19 '18 at 13:32

score 3 · Accepted Answer · edited Sep 19 '18 at 13:27

3

I believe need Series.str.rsplit with n=2 for spliting by only 2 last whitespaces:

df[['Name','Country','Digit']] = df['Note'].str.rsplit(n=2, expand=True)

equals too:

df[['Name','Country','Digit']] = df['Note'].str.rsplit(' ', n=2, expand=True)

Proof:

import pandas as pd

df = pd.DataFrame({
    'ID': [1,2,3,4],
    'Note': [
        'Sam John Brazil 2', 
        'Simion Canada 4',
        'Sam John Brazil 1',
        'Henry G. Hilson Spain 3']
})

df[['Name','Country','Digit']] = df['Note'].str.rsplit(n=2, expand=True)

print(df)

Returns:

   ID                     Note             Name Country  Digit
0   1        Sam John Brazil 2         Sam John  Brazil      2
1   2          Simion Canada 4           Simion  Canada      4
2   3        Sam John Brazil 1         Sam John  Brazil      1
3   4  Henry G. Hilson Spain 3  Henry G. Hilson   Spain      3

Add this: (if Digit column is to be numeric)

df['Digit'] = pd.to_numeric(df['Digit'], errors='coerce')

Details:

#print(df['Note'].str.rsplit(n=2, expand=True))

                 0       1  2
0         Sam John  Brazil  2
1           Simion  Canada  4
2         Sam John  Brazil  1
3  Henry G. Hilson   Spain  3

edited Sep 19 '18 at 13:27

Anton vBR

18,287
5
40
46

answered Sep 19 '18 at 13:11

jezrael

822,522
95
1,334
1,252

Smart. Probably a: `df['Digit'] = df['Digit'].astype(int)` after too. – Anton vBR Sep 19 '18 at 13:12
Hello jezrael! I'm newbie on python and I was there just to learn something. Can you explain your answer please? thanks in advance – Simo Sep 19 '18 at 13:13
2

@Simo - Sure, but first need create sample data :( – jezrael Sep 19 '18 at 13:13
1

@JohnZwinck That is only true if you assign a scalar to multiple cols. – Anton vBR Sep 19 '18 at 13:21
@Simo do you understand it now? – Anton vBR Sep 19 '18 at 13:26
If the state is state is made by 2 words(eg. `United Kingdom`) doesnt work properly :( – Joe Sep 19 '18 at 13:33
@Joe - Yes, you are right - possible solution is 2 repalcements like `countries2 = ['United Kingdom'] d = {x:x.replace(' ','_') for x in countries2} d1 = {v:k for k, v in d.items()} df[['Name','Country','Digit']] = df['Note'].replace(d, regex=True).str.rsplit(n=2, expand=True) df['Country'] = df['Country'].replace(d1, regex=True)` – jezrael Sep 19 '18 at 14:02

How to create new columns with custom split on python

1 Answers1