I have a dataframe called data, I am trying to clean one of the columns in the dataframe so I can convert the price into numerical values only.
This is how I'm filtering for the column to find those incorrect values.
data[data['incorrect_price'].astype(str).str.contains('[A-Za-z]')]
Incorrect_Price Occurences errors
23 99 cents 732 1
50 3 dollars and 49 cents 211 1
72 the price is 625 128 3
86 new price is 4.39 19 2
138 4 bucks 3 1
199 new price 429 13 1
225 price is 9.99 5 1
240 new price is 499 8 2
I have tried data['incorrect_Price'][20:51].str.findall(r"(\d+) dollars")
and data['incorrect_Price'][20:51].str.findall(r"(\d+) cents")
to find rows that have "cents" and "dollars" in them so I can extract the dollar and cents amount but haven't been able to incorporate this when iterating over all rows in the dataframe.
I would like the results to like look this:
Incorrect_Price Desired Occurences errors
23 99 cents .99 732 1
50 3 dollars and 49 cents 3.49 211 1
72 the price is 625 625 128 3
86 new price is 4.39 4.39 19 2
138 4 bucks 4.00 3 1
199 new price 429 429 13 1
225 price is 9.99 9.99 5 1
240 new price is 499 499 8 2