0

Problem

Cleaning sales data. Prices and quantities refuse to properly convert to either float or integer dtypes.

example file

Attempts

1. Converter function that removes "$" and ","
    Result = error message: "cannot convert string to float "" "

2. pd.to_numeric() but upon checking column via print(df['price'])
    Result = *supposedly* converts dtype to float but printing result of column returns "NaN" for all entries

Expectations

Expecting the dtype in the series to be converted to "float" so that math functions may be performed on the columns. Float columns = ['Price', 'Credit Qty', 'Credit Total']

Before running my script, I ran a check on the price column and noticed my last two rows contain NaN. (Yes, I acknowledge my script coerces errors). Please also note, the last row always contains a text string of applied filters because the .csv data is downloaded from PowerBi dashboards.

My overall goal is to accomplish all required conversions within my read_csv() argument.

CODE

    def convert_currency(val):
        new_val = val.replace(',','').replace('$', '')
        return float(new_val)

    df = pd.read_csv("file_name.csv", na_values = [''],
               dtype={'Price': convert_currency,
                      'Credit Qty': lambda x: pd.to_numeric(x, errors='coerce', downcast = "float"),
                      'Credit Total': convert_currency)
                         })
    df.dtypes
  • 1
    Can you provide a sample of what filename.csv can be expected to look like? – Galo do Leste Jan 28 '23 at 06:24
  • It's too late to fix this after importing into pandas. If it finds values in a column that don't fit, it converts the whole column to "object". You should fix up the CSV first, then import it clean. – Tim Roberts Jan 28 '23 at 06:37
  • @Galo do Leste - I attached a picture of my example file. – DerekJohn13 Jan 28 '23 at 19:15
  • @Tim Roberts - Unfortunately, that is exactly what I am trying to avoid. Data can range anywhere from ten to hundreds to thousands and tens of thousands of lines. – DerekJohn13 Jan 28 '23 at 19:15
  • Please don't post (links to) images, make a small sample which contains the essential features, and provide it within code fences as text (see [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/14311263)). – Timus Jan 28 '23 at 21:34
  • I don't understand why you're using the `dtypes` argument? Try the `converters` instead, for example `converters = {"Price": lambda s: float(s.lstrip("$").replace(",", ""))}` with `df = pd.read_csv("data.csv", converters=converters, on_bad_lines="skip")`. Or fix the resp. columns afterwards, for example `df = pd.read_csv("data.csv", on_bad_lines="skip")` and then `df["Price"] = df["Price"].str.replace(r"\$|,", "", regex=True).astype("float")`. – Timus Jan 28 '23 at 21:37
  • Thank you @Timus! Your second suggestion worked like a charm. I tried converters = converters but still received the error message, "cannot convert string to float "" " – DerekJohn13 Jan 29 '23 at 05:18

0 Answers0