0

I have been trying to write the following code and it does not let me to convert the columns Amount currency,Amount in to float. Below is my query

import pandas as pd
from datetime import datetime
df= pd.read_csv(r'C:\Users\Desktop\CustomerData.csv')

parsed = pd.to_datetime(df["Date"], errors="coerce").fillna(pd.to_datetime(df["Date"],format="%Y-%d-%m",errors="coerce"))

ordinal = pd.to_numeric(df["Date"], errors="coerce").apply(lambda x: pd.Timestamp("1899-12-30")+pd.Timedelta(x, unit="D"))
df["Date"] = parsed.fillna(ordinal)

newdf = df.loc[(df.Type == "Sales Invoice")]

df2 = newdf.groupby(['Date','Customer','Type'])["Amount currency", "Amount"].apply(lambda x : x.astype(float).sum())

But i am getting the below error

could not convert string to float: '10,084.80'

And in the csv file, I have checked the column contents. But in the number columns, it does not contain any special character.

the below is the general info of my data set

Data columns (total 5 columns):
   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Date             80 non-null     datetime64[ns]
 1   Customer         80 non-null     object        
 2   Amount currency  80 non-null     object        
 3   Type             80 non-null     object        
 4   Amount           80 non-null     object        
dtypes: datetime64[ns](1), object(4)

These columns, Amount currency and Amount cannot be converted in to float type with the above codes.

These are the steps I did -

1.Importing the data set

2.Cleaning the dte column and arranging the data

3.converting the date column to YYY-MM format

4.creating a group by after selecting the type in to "Sales Invoice"

5.Group by sum will generate a set of multiple data instead of sum of the values.

For that, need to convert the values in to float

Any one to help?

Thanks

  • The code you provided is unreadable. Indent it properly and use line breaks – Nehal Birla Jan 27 '23 at 15:43
  • Since the value in the column has a comma, it will give error when you convert it into float. May be this could help [Convert number strings with commas in pandas DataFrame to float](https://stackoverflow.com/questions/22137723/convert-number-strings-with-commas-in-pandas-dataframe-to-float) – Nehal Birla Jan 27 '23 at 15:53
  • Does this answer your question? [Convert number strings with commas in pandas DataFrame to float](https://stackoverflow.com/questions/22137723/convert-number-strings-with-commas-in-pandas-dataframe-to-float) – Nehal Birla Jan 27 '23 at 15:56

1 Answers1

0

You can use the following code:

df['Amount currency'] = df['Amount currency'].str.replace(r'[^0-9\.]', '', regex=True)
df['Amount currency'] = pd.to_numeric(df['Amount currency'])

The first line will delete any characters that are not a digit or a dot. So, instead of 10,084.80, you will get '10084.80'. Then, by executing the second line, you will successfully convert that to a float.

Beware, if you have negative numbers (i.e. -10,084.80), the first line will delete the '-' and therefore you'll end up with a positive float. If you need to keep the '-' to have a negative float, just change the regex expression to r'[^0-9\.\-]', which will keep the minus and get you -10084.80.

Jose Vega
  • 529
  • 1
  • 6
  • 16
  • Just to clear up, I provided two lines instead of a one liner to keep the principle of readability first. – Jose Vega Jan 27 '23 at 16:00