I have several rows of text. The first row is a header row, and each subsequent line represents the fields of data, each value is separated with a comma. Within each line are one to three dollar values, ranging from single digit dollar values ($4.50) to triple digit ($100,000.34). They are also surrounded by quotes.
206360941,5465685679,"$4,073.77",567845676547,"$88,457.21",34589309683
I need to eliminate the quotations and dollar sign for the money values, as well as the comma inside. The period separator for the decimal value needs to stay, so "$6,801.56" becomes 6801.56
I've used regex to eliminate the dollar sign as well as quotations--
with open("datafile.csv", "r") as file:
data = file.readlines()
for i in data:
i = re.sub('[$"]', '', i)
which then makes the data look like 7545245,6,801.56,3545647
so if I split by a comma, it cuts larger values in two.
['206360941,5465685679,4,073.77,567845676547,88,457.21,34589309683']
I thought about splitting by quotations, doing some more regex and rejoining with .join() but it turns out that only the currency values with a comma contain quotations, the smaller values with no comma do not.
Also, I know I can use re.findall(r'\$\d{1,3}\,\d\d\d\.\d\d', i)
to draw out the number format, if I print it, it will output a list like [$100,351.35]
I am just not sure what to do with it after that.