0

I have a data frame with a column location which contains a lot of text and then an actual location code. I'm trying to extract out the location code, and I figured out if I split the string on spaces then I can just grab the last item in each list. For example, in the df below the column location is the original column and after applying my split I now have location_split.

  | location             | location_split
0 | Town (123)           | ['Town', '(123)']
1 | Town Town (123AB)    | ['Town', 'Town', '(123AB)']
2 | Town (40832) (123BC) | ['Town', '(40832)', '(123BC)']
3 | Town (987)           | ['Town', '(987)']

But, how do I make it so that I can pull out the last item in the list and have that be the value for location_split? Something like df['location']=df['location_split'][-1] and end up with the location column below. I did attempt regex, but since some rows have multiple parentheses containing numbers it couldn't differentiate, but splitting them and then grabbing the last item on the list seems the most foolproof.

  | location
0 | (123)
1 | (123AB)
2 | (123BC)
3 | (987)
carousallie
  • 776
  • 1
  • 7
  • 25

2 Answers2

1

You can use .str accessor

df['location'] = df['location_split'].str[-1]
# or
df['location'] = df['location_split'].str.get(-1)
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
0

You can use the regex:

df['location'] = df['location'].astype(str).str.extract('(\(.*\))')
imburningbabe
  • 742
  • 1
  • 2
  • 13