I am trying to convert a csv into a required format where a text field contains currency data in th format "A$ 1,000.00"
I can replace the "A$ " using: df.Credit.str.replace('A$ ','',regex=False)
and then later convert the remaining string value to a float by casting it at the time of use, but I forgot the thousands are comma separated.
When importing the csv into the dataframe, I can use the thousands separator keyword, but because the column is imported as a string, it's not given a numeric value (because of the 'A$ ').
So I need to run the conversion of the comma AFTER importing it.
Is there a way I can do it all in the initial read of the CSV?
This is what I'd come up with so far, but doesn't work because it is out of order:
import pandas as pd
from collections import defaultdict
file = 'mydatafile.csv'
data = pd.read\_csv(file,thousands=',')
data.Credit = data.Credit.str.replace('A$ ','',regex=False)
sales = defaultdict(float)
for k,v in data.iterrows():
sales[k]+=float(v.Credit)
print(dict(sales))
There are a couple of similar questions however they lack answers or don’t apply, eg:
Pandas: Read CSV: ValueError: could not convert string to float I’m already using the thousands separator without success.
Panda load csv string to float Again, not the same, and the solution is unrelated to my problem
edit: I have also found this similar, but opposite question where someone is wanting to apply a format over the data, where-as I'd more like to remove it.
Can I somehow apply a regex
that encompases both the removal of the A$
and subsequent commas? Or is there a way to have the data be 'accepted' just in the way speadsheets 'ignore' currency symbols? I know this isn't a spreadsheet, but if pandas could be told that a string of this format is actually a float, that would solve my issue.
edit: for the time being, I have implemented Björn's answer with an extra .str
to make it work, such that:
data.Credit = data.Credit.str.replace('A$ ','',regex=False).str.replace(',','').astype(float)
complete code:
import pandas as pd
from collections import defaultdict
file = 'mydatafile.csv'
data = pd.read\_csv(file,thousands=',')
data.Credit = data.Credit.str.replace('A$ ','',regex=False).str.replace(',','').astype(float)
sales = defaultdict(float)
for k,v in data.iterrows():
sales[k]+=float(v.Credit)
print(dict(sales))