3

I have a similar question to this one: Pandas DataFrame: remove unwanted parts from strings in a column.

So I used:

temp_dataframe['PPI'] = temp_dataframe['PPI'].map(lambda x: x.lstrip('PPI/'))

Most, of the items start with a 'PPI/' but not all. It seems that when an item without the 'PPI/' suffix encountered this error:

AttributeError: 'float' object has no attribute 'lstrip'

Am I missing something here?

Community
  • 1
  • 1
A Rob4
  • 1,278
  • 3
  • 17
  • 35
  • Are those caused by missing values, or actual floats? Can you show value of a row that causes this? (Trying to learn and understand here) – bakkal Jun 20 '16 at 10:06

2 Answers2

5

use replace:

temp_dataframe['PPI'].replace('PPI/','',regex=True,inplace=True)

or string.replace:

temp_dataframe['PPI'].str.replace('PPI/','')
shivsn
  • 7,680
  • 1
  • 26
  • 33
3

use vectorised str.lstrip:

temp_dataframe['PPI'] = temp_dataframe['PPI'].str.lstrip('PPI/')

it looks like you may have missing values so you should mask those out or replace them:

temp_dataframe['PPI'].fillna('', inplace=True)

or

temp_dataframe.loc[temp_dataframe['PPI'].notnull(), 'PPI'] = temp_dataframe['PPI'].str.lstrip('PPI/')

maybe a better method is to filter using str.startswith and use split and access the string after the prefix you want to remove:

temp_dataframe.loc[temp_dataframe['PPI'].str.startswith('PPI/'), 'PPI'] = temp_dataframe['PPI'].str.split('PPI/').str[1]

As @JonClements pointed out that lstrip is removing whitespace rather than removing the prefix which is what you're after.

update

Another method is to pass a regex pattern that looks for the optionally prefix and extract all characters after the prefix:

temp_dataframe['PPI'].str.extract('(?:PPI/)?(.*)', expand=False)
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 2
    Don't forget the `.lstrip` probably isn't what the OP wants - it'll remove all characters that are `P`, `I` or `/` from the beginning of the string - it's not actually removing a prefix if it exists... – Jon Clements Jun 20 '16 at 09:59
  • 2
    Or possibly `temp_dataframe['PPI'].str.extract('(?:PPI/)?(.*)', expand=False)` – Jon Clements Jun 20 '16 at 10:23