0

I have a dataframe column which appears as follows and has the following characteristics:

>>> df.dtypes
location     object
sensor_1     object
sensor_2    float64

>>> df['sensor_1'].head(4)
0    3 m3/h
1       NaN
2       NaN
3       NaN
Name: sensor_1, dtype: object

>>> type(df['sensor_1'][0])
str

>>> type(df['sensor_1'][1])
float

My goal is to keep the numeric part and recognise it as float from "sensor_1", taking into consideration the fact that there Nulls which are recognised already as numeric, as I understand.

I tried a few things which did not work:

pd.to_numeric(df['sensor_1'], errors='coerce')  #it did not change anything
df['sensor_1'].apply(lambda x: x.str[:-5].astype(float) if pd.notnull(x) else x)  
 #tried to strip the last 5 characters if not null and then convert the remaining part to float

AttributeError: 'str' object has no attribute 'str'
df['sensor_1'].to_string()  #unsure how to go on from there

So... running out of ideas really and asking for your help. Thank you ^_^

Newbielp
  • 431
  • 3
  • 16

1 Answers1

1

Use Series.str.extract, but first convert values to strings and last to floats:

df['sensor_1'] = (df['sensor_1'].astype(str)
                                .str.extract('((\d+\.*\d*))', expand=False)
                                .astype(float))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I am using the first solution which is close the one I had in mind. However, I realised that if all the elements of the column "sensor_1" are NaN, I get ```AttributeError: Can only use .str accessor with string values!``` – Newbielp Nov 26 '19 at 08:18
  • @Newbielp - then problem is there are numeric values, so use `df['sensor_1'] = df['sensor_1'].astype(str).str[:-5].astype(float)` – jezrael Nov 26 '19 at 08:20
  • then I get ```ValueError: could not convert string to float:``` Not sure how to deal with it.I am gonna try the other solutions you have offered for the different cases that exist in my datasets. What does ```extract('(\d+)'``` do? – Newbielp Nov 26 '19 at 09:32
  • @Newbielp - It extract first integer value – jezrael Nov 26 '19 at 09:32
  • But I may also have numbers that carry decimals, so I would need 5.4 for example... – Newbielp Nov 26 '19 at 09:35
  • 1
    @Newbielp - then change `(\d+)` to `(\d+\.*\d*)` - solution from [this](https://stackoverflow.com/a/28832504) – jezrael Nov 26 '19 at 09:37
  • It worked for decimals, but I still have the problem that if "sensor_1" is completely empty, then I am getting the same ```AttributeError```. Maybe I need to do something like ```try: catch``` or optimize the ```if pd.notnull(x) else x``` – Newbielp Nov 26 '19 at 09:46
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/203089/discussion-between-jezrael-and-newbielp). – jezrael Nov 26 '19 at 09:46