2

I have dataframe from pandas. I am not sure how to start extracting the information.

The data in the column is like this:

->Company A 100->Company B 60->Company C 80-> Company D
->Company A 100->Company B 53.1->Company C 82.2->Company D 100-> Company G
->Company A 100->Company B 23-> Company D

etc...

The information of the companies are not important, only the last number in each item.

I want to extract the number before the last -> and paste it into another column.

For example, extract 80, 100, and 23, respectively.

80
100
23

I have this

import re
text = '->Company A 100->Company B 60->Company C 80-> Company D'
re.findall(r'\d+', text)[-1]

which gives the right output '80'

But when I do it for a df

re.findall(r'\d+', df['ColumnName'])[-1]

I get: TypeError: expected string or bytes-like object

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Mataunited18
  • 598
  • 1
  • 7
  • 20

2 Answers2

3

You can try the following if you want the last number of the string:

df['num'] = df['text'].str[::-1].str.extract('(\d+)')
df['num'] = df['num'].str[::-1]
df

           text                                         num
0   -> Company A 100->Company B 60->Company C 80->...   80
1   -> Company A 100->Company B 53.1->Company C 82...   100
2   -> Company A 100->Company B 23-> Company D          23

You can also try this if you only want the number before the last '->':

df['num'] = df['text'].str.split('->').str[-2].str.extract('(\d+)')

           text                                         num
0   -> Company A 100->Company B 60->Company C 80->...   80
1   -> Company A 100->Company B 53.1->Company C 82...   100
2   -> Company A 100->Company B 23-> Company D          23

If you want float numbers too before the last '->':

df['text'].str.split('->').str[-2].str.extract('(\d+(?:\.\d+)?)')

The change is in the regex

Mohit Motwani
  • 4,662
  • 3
  • 17
  • 45
3

Use regex for get floats or integers with findall and get last values by indexing:

df['num'] = df['col'].str.findall(r'([-+]?\d*\.\d+|\d+)').str[-1]
print (df)


                                                 col  num
0  -> Company A 100->Company B 60->Company C 80->...   80
1  -> Company A 100->Company B 53.1->Company C 82...  100
2         -> Company A 100->Company B 23-> Company D   23
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252