Extract last term after comma into new column

Question

I have a pandas dataframe which is essentially 2 columns and 9000 rows

CompanyName  |  CompanyAddress

and the address is in the form

Line1, Line2, ..LineN, PostCode

i.e. basically different numbers of comma-separated items in a string (or dtype 'object'), and I want to just pull out the post code i.e. the item after the last comma in the field

I've tried the Dot notation string manipulation suggestions (possibly badly):

df_address['CompanyAddress'] = df_address['CompanyAddress'].str.rsplit(', ')

which just put '[ ]' around the fields - I had no success trying to isolate the last component of any split-up/partitioned string, with maxsplit kicking up errors.

I had a small degree of success following EdChums comment to Pandas split Column into multiple columns by comma

pd.concat([df_address[['CompanyName']], df_address['CompanyAddress'].str.rsplit(', ', expand=True)], axis=1)

However, whilst isolating the Postcode, this just creates multiple columns and the post code is in columns 3-6... equally no good.

It feels incredibly close, please advise.

    EmployerName    Address
0   FAUCET INN LIMITED  [Union, 88-90 George Street, London, W1U 8PA]
1   CITIBANK N.A    [Citigroup Centre,, Canary Wharf, Canada Squar...
2   AGENCY 2000 LIMITED     [Sovereign House, 15 Towcester Road, Old Strat...
3   Transform Trust     [Unit 11 Castlebridge Office Village, Kirtley ...
4   R & R.C.BOND (WHOLESALE) LIMITED    [One General Street, Pocklington Industrial Es...
5   MARKS & SPENCER FINANCIAL SERVICES PLC  [Marks & Spencer Financial, Services Kings Mea...

Vaishali · Accepted Answer · 2018-04-04T21:51:31.867

Given the DataFrame,

df = pd.DataFrame({'Name': ['ABC'], 'Address': ['Line1, Line2, LineN, PostCode']})

    Address                         Name
0   Line1, Line2, LineN, PostCode   ABC

If you need only post code, you can extract that using rsplit and re-assign it to the column Address. It will save you the step of concat.

df['Address'] = df['Address'].str.rsplit(',').str[-1]

You get

    Address     Name
0   PostCode    ABC

Edit: Give that you have dataframe with address values in list

df = pd.DataFrame({'Name': ['FAUCET INN LIMITED'], 'Address': [['Union, 88-90 George Street, London, W1U 8PA']]})

    Address                                         Name
0   [Union, 88-90 George Street, London, W1U 8PA]   FAUCET INN LIMITED

You can get last element using

df['Address'] = df['Address'].apply(lambda x: x[0].split(',')[-1])

You get

    Address     Name
0   W1U 8PA     FAUCET INN LIMITED

Hi - this unfortunately gives me the same error as Joe Samanek: `AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas` -- is there something i'm missing from my data types? — BAC83, Apr 04 '18 at 21:28

Joe Samanek · Answer 2 · 2018-04-04T22:18:02.500

1

Just rsplit the existing column into 2 columns - the existing one and a new one. Or two new ones if you want to keep the existing column intact.

df['Address'], df['PostCode'] = df['Address'].str.rsplit(', ', 1).str

Edit: Since OP's Address column is a list with 1 string in it, here is a solution for that specifically:

df['Address'], df['PostCode'] = df['Address'].map(lambda x: x[0]).str.rsplit(', ', 1).str

edited Apr 04 '18 at 22:18

answered Apr 04 '18 at 21:00

Joe Samanek

1,644
12
16

This gives `AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas` – BAC83 Apr 04 '18 at 21:06
Hm, works for me on my sample df. What types are you columns? – Joe Samanek Apr 04 '18 at 21:10
both columns are `dtype` object (apparently) – BAC83 Apr 04 '18 at 21:12
And are you sure all your rows have correct values? This looks like you have some empty rows in there or something. – Joe Samanek Apr 04 '18 at 21:14
Running `np.where(pd.isnull(df))` returns `(array([], dtype=int32), array([], dtype=int32))`; so i think it's ok – BAC83 Apr 04 '18 at 21:22
Oh, just checked your update, and your Address column is not a string, it is an array... Ok I will update my answer. – Joe Samanek Apr 04 '18 at 21:44
Added a solution for your specific data format. Tested it on my sample df, works fine. Btw, next time it is better if you show sample of your data right away, it was really impossible for us to guess that your Address is not a string, but a list with 1 string element in it. – Joe Samanek Apr 04 '18 at 22:10

score 0 · Answer 3 · answered Apr 04 '18 at 21:02

0

rsplit returns a list, try rsplit(‘,’)[0] to get last element in source line

answered Apr 04 '18 at 21:02

Cmaster

72
1
3

Extract last term after comma into new column

3 Answers3