0

Greetings Stack Overflow Community!

I am trying to perform a seemingly simple operation, but it is turning out to be quite frustrating for me!

Allow me to explain in simple terms: I have this dataframe...

print(dfx)

Select a BW Speed
0               50 Mb
1              100 Mb
2              100 Mb
3               50 Mb
4               50 Mb

I need a piece of code that will manipulate only the second column by 1)Striping out the space and the "Mb" characters, then 2)convert this into an Int (or a float, even) so that I can perform further comparisons/analysis down the line. I basically just want the numerical part of the data, nothing else!

This is an example of what it should look like ideally:

print(dfx)

Select a BW Speed
0               50
1              100
2              100
3               50
4               50

This is my latest attempt:

 dfx ['Select a BW Speed']= dfx['Select a BW Speed'].str.replace(r'\D', '').astype(int)

Which results in this error....

ValueError: cannot convert float NaN to integer

What am I doing wrong here? Any help is greatly appreciated :)

Best,

-Christopher

brooklynveezy
  • 105
  • 10

1 Answers1

0

The problem arises from trying to cast to int NaN values. Hence you need pd.to_numeric to handle those cases. Here's a way using panda's str accessor methods:

pd.to_numeric(df['Select a BW Speed'].str.split().str[0], errors='coerce')

0     50
1    100
2    100
3     50
4     50
Name: Select a BW Speed, dtype: int64

Or using your own approach:

pd.to_numeric(df['Select a BW Speed'].str.replace(r'\D', ''), errors='coerce')
yatu
  • 86,083
  • 12
  • 84
  • 139