I’m working on cleaning and EDA of a time series dataset of revenues. For some of the entries, the values are prefaced with an ‘(R) ‘ meaning the value has been revised, and is shown like (R) 1000. Example:
df = pd.DataFrame({
'year': ['2005', '2006', '2007'],
'revenue': [500, (R) 1000, 2200]})
Strangely, the data type for this column is still showing as float64 and works when compiling a lineplot. In the original Excel spreadsheet, when going to highlight a particular cell, the (R) disappears and only displays the numerical value.
I have developed a working code as follows:
df['revenue'] = df['revenue'].replace('(R) ','', regex=True)
This code does not return any errors, but it is unsuccessful in removing the (R) values from this column when looking at the dataframe. This (R) seems to work as some kind of placeholder, but I cannot figure out how to remove it, and it conflicts with my other data.
Basically, I just want to change values such as (R) 1000 to 1000