2

I have a pandas dataframe which is essentially 2 columns and 9000 rows

CompanyName  |  CompanyAddress

and the address is in the form

Line1, Line2, ..LineN, PostCode

i.e. basically different numbers of comma-separated items in a string (or dtype 'object'), and I want to just pull out the post code i.e. the item after the last comma in the field

I've tried the Dot notation string manipulation suggestions (possibly badly):

df_address['CompanyAddress'] = df_address['CompanyAddress'].str.rsplit(', ') 

which just put '[ ]' around the fields - I had no success trying to isolate the last component of any split-up/partitioned string, with maxsplit kicking up errors.

I had a small degree of success following EdChums comment to Pandas split Column into multiple columns by comma

pd.concat([df_address[['CompanyName']], df_address['CompanyAddress'].str.rsplit(', ', expand=True)], axis=1)

However, whilst isolating the Postcode, this just creates multiple columns and the post code is in columns 3-6... equally no good.

It feels incredibly close, please advise.

    EmployerName    Address
0   FAUCET INN LIMITED  [Union, 88-90 George Street, London, W1U 8PA]
1   CITIBANK N.A    [Citigroup Centre,, Canary Wharf, Canada Squar...
2   AGENCY 2000 LIMITED     [Sovereign House, 15 Towcester Road, Old Strat...
3   Transform Trust     [Unit 11 Castlebridge Office Village, Kirtley ...
4   R & R.C.BOND (WHOLESALE) LIMITED    [One General Street, Pocklington Industrial Es...
5   MARKS & SPENCER FINANCIAL SERVICES PLC  [Marks & Spencer Financial, Services Kings Mea...
BAC83
  • 811
  • 1
  • 12
  • 27

3 Answers3

5

Given the DataFrame,

df = pd.DataFrame({'Name': ['ABC'], 'Address': ['Line1, Line2, LineN, PostCode']})

    Address                         Name
0   Line1, Line2, LineN, PostCode   ABC

If you need only post code, you can extract that using rsplit and re-assign it to the column Address. It will save you the step of concat.

df['Address'] = df['Address'].str.rsplit(',').str[-1] 

You get

    Address     Name
0   PostCode    ABC

Edit: Give that you have dataframe with address values in list

df = pd.DataFrame({'Name': ['FAUCET INN LIMITED'], 'Address': [['Union, 88-90 George Street, London, W1U 8PA']]})

    Address                                         Name
0   [Union, 88-90 George Street, London, W1U 8PA]   FAUCET INN LIMITED

You can get last element using

df['Address'] = df['Address'].apply(lambda x: x[0].split(',')[-1])

You get

    Address     Name
0   W1U 8PA     FAUCET INN LIMITED
Vaishali
  • 37,545
  • 5
  • 58
  • 86
  • Hi - this unfortunately gives me the same error as Joe Samanek: `AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas` -- is there something i'm missing from my data types? – BAC83 Apr 04 '18 at 21:28
  • Can you post df_address['CompanyAddress'].head()? – Vaishali Apr 04 '18 at 21:29
1

Just rsplit the existing column into 2 columns - the existing one and a new one. Or two new ones if you want to keep the existing column intact.

df['Address'], df['PostCode'] = df['Address'].str.rsplit(', ', 1).str

Edit: Since OP's Address column is a list with 1 string in it, here is a solution for that specifically:

df['Address'], df['PostCode'] = df['Address'].map(lambda x: x[0]).str.rsplit(', ', 1).str
Joe Samanek
  • 1,644
  • 12
  • 16
  • This gives `AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas` – BAC83 Apr 04 '18 at 21:06
  • Hm, works for me on my sample df. What types are you columns? – Joe Samanek Apr 04 '18 at 21:10
  • both columns are `dtype` object (apparently) – BAC83 Apr 04 '18 at 21:12
  • And are you sure all your rows have correct values? This looks like you have some empty rows in there or something. – Joe Samanek Apr 04 '18 at 21:14
  • Running `np.where(pd.isnull(df))` returns `(array([], dtype=int32), array([], dtype=int32))`; so i think it's ok – BAC83 Apr 04 '18 at 21:22
  • Oh, just checked your update, and your Address column is not a string, it is an array... Ok I will update my answer. – Joe Samanek Apr 04 '18 at 21:44
  • Added a solution for your specific data format. Tested it on my sample df, works fine. Btw, next time it is better if you show sample of your data right away, it was really impossible for us to guess that your Address is not a string, but a list with 1 string element in it. – Joe Samanek Apr 04 '18 at 22:10
0

rsplit returns a list, try rsplit(‘,’)[0] to get last element in source line

Cmaster
  • 72
  • 1
  • 3