0

this is how my data looks like when opened in Microsoft Excel.
enter image description here
As can be seen all the contents of the cell except 218(aligned to the right of the cell) are parsed as strings(aligned to the left of the cell). It is because they start with white space(it is " 4,610" instead of "4610").

I would like to remove all those white spaces at the beginning and also replace those commas(not the ones that make csvs csvs) because if comma exists 4 and 610 may be read into different cells.

Here's what I tried:

this is what i tried with inspiration from this stackoverflow answer:

import csv
import string

with open("old_dirty_file.csv") as bad_file:
    reader = csv.reader(bad_file, delimiter=",")
    with open("new_clean_file.csv", "w", newline="") as clean_file:
        writer = csv.writer(clean_file)
        for rec in reader:
            writer.writerow(map(str.replace(__old=',', __new='').strip, rec))

But, I get this error:

Traceback (most recent call last):
  File "C:/..,,../clean_files.py", line 9, in <module>
    writer.writerow(map(str.replace(__old=',', __new='').strip, rec))
TypeError: descriptor 'replace' of 'str' object needs an argument

How do I clean those files?

Naveen Reddy Marthala
  • 2,622
  • 4
  • 35
  • 67

1 Answers1

2

Just need to separate replacement from stripping because python doesn't know which string the replacement should be made in.

for rec in reader:
    rec = (i.replace(__old=',', __new='') for i in rec)
    writer.writerow(map(str.strip, rec))

or combine them into a single function:

repstr = lambda string, old=',', new='': string.replace(old, new).strip()
for rec in reader:
    writer.writerow(map(repstr, rec))
kendfss
  • 435
  • 5
  • 11