I am attempting to parse Storm Data from NOAA. After some cleaning and parsing I am left with a DataFrame much like this:
import pandas as pd
data = { 'ID' : [1, 2, 3, 4, 5, 6],
'EVENT_TYPE': ['Flood', 'Hail', 'Fire', 'Tornado', 'Flood', 'Fire'],
'Property_Damage': ['2.5K', 0, '.4M', "1.00K", NaN, "1K"]}
df = pd.DataFrame(data)
For the record, this example DataFrame is just a simplification. The real DataFrame has approx. 25 columns and 22 million rows. I would like to convert the values in df['Property_Damage']
from strings to numeric values. My desired outcome would look similar to [2500, 0, 400000, 1000, 0, 1000]
.
I understand that I am making an assumption that NaN
values can be replaced with 0
. I tried to split records into multiple columns with
damage_property_split = df['Propery_Damage'].str.split([-1], expand=True)
but this does not work with records that are 0
or NaN
.
What is the best way to convert
['2.5K', 0, '.4M', "1.00K", NaN, "1K"]
to [2500, 0, 400000, 1000, 0, 1000]
?
Thank you for any help!!!