Problem
Cleaning sales data. Prices and quantities refuse to properly convert to either float or integer dtypes.
Attempts
1. Converter function that removes "$" and ","
Result = error message: "cannot convert string to float "" "
2. pd.to_numeric() but upon checking column via print(df['price'])
Result = *supposedly* converts dtype to float but printing result of column returns "NaN" for all entries
Expectations
Expecting the dtype in the series to be converted to "float" so that math functions may be performed on the columns. Float columns = ['Price', 'Credit Qty', 'Credit Total']
Before running my script, I ran a check on the price column and noticed my last two rows contain NaN. (Yes, I acknowledge my script coerces errors). Please also note, the last row always contains a text string of applied filters because the .csv data is downloaded from PowerBi dashboards.
My overall goal is to accomplish all required conversions within my read_csv() argument.
CODE
def convert_currency(val):
new_val = val.replace(',','').replace('$', '')
return float(new_val)
df = pd.read_csv("file_name.csv", na_values = [''],
dtype={'Price': convert_currency,
'Credit Qty': lambda x: pd.to_numeric(x, errors='coerce', downcast = "float"),
'Credit Total': convert_currency)
})
df.dtypes