Strip white spaces from CSV file

Question

I need to stripe the white spaces from a CSV file that I read

import csv

aList=[]
with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        aList.append(row)
    # I need to strip the extra white space from each string in the row
    return(aList)

score 45 · Answer 1 · edited Jun 22 '13 at 22:24

45

There's also the embedded formatting parameter: skipinitialspace (the default is false) http://docs.python.org/2/library/csv.html#csv-fmt-params

aList=[]
with open(self.filename, 'r') as f:
    reader = csv.reader(f, skipinitialspace=False,delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        aList.append(row)
    return(aList)

edited Jun 22 '13 at 22:24

Óscar López

232,561
37
312
386

answered Jun 14 '13 at 21:34

CaraW

451
5
3

5

To stripe white spaces: `reader = csv.reader(f, skipinitialspace=True,delimiter=',', quoting=csv.QUOTE_NONE)`, right? – Caco Sep 18 '17 at 20:51
4

Not removing trailling spaces? Why is there not option for this??? – SiL3NC3 Mar 01 '21 at 12:22

CivFan · Answer 2 · 2018-12-03T18:24:17.957

In my case, I only cared about stripping the whitespace from the field names (aka column headers, aka dictionary keys), when using csv.DictReader.

Create a class based on csv.DictReader, and override the fieldnames property to strip out the whitespace from each field name (aka column header, aka dictionary key).

Do this by getting the regular list of fieldnames, and then iterating over it while creating a new list with the whitespace stripped from each field name, and setting the underlying _fieldnames attribute to this new list.

import csv

class DictReaderStrip(csv.DictReader):
    @property                                    
    def fieldnames(self):
        if self._fieldnames is None:
            # Initialize self._fieldnames
            # Note: DictReader is an old-style class, so can't use super()
            csv.DictReader.fieldnames.fget(self)
            if self._fieldnames is not None:
                self._fieldnames = [name.strip() for name in self._fieldnames]
        return self._fieldnames

If your whitespace is only to the left of the field name like `' date'` and not `'date '` or `' date '`, [`csv.DictReader(csvfile, skipinitialspace=True)`](https://docs.python.org/3/library/csv.html#csv.Dialect.skipinitialspace) works just as well. — Boris Verkhovskiy, Jan 13 '19 at 18:52

score 12 · Answer 3 · answered Feb 14 '13 at 23:27

12

with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    return [[x.strip() for x in row] for row in reader]

answered Feb 14 '13 at 23:27

mgilson

300,191
65
633
696

This is the best possible solution using the `csv` module. The `Reader` class returned by the `csv.reader()` function isn't exposed in the `csv` or `_csv` module to allow overriding its `next()` method. – CivFan Sep 09 '15 at 16:45
2

@CivFan It's even better to yield a generator, especially with big files: `for row in reader: yield (c.strip() for c in row)` – Nuno André Aug 04 '18 at 12:34

score 4 · Answer 4 · answered Feb 14 '13 at 23:26

4

You can do:

aList.append([element.strip() for element in row])

answered Feb 14 '13 at 23:26

sapi

9,944
8
41
71

Nuno André · Answer 5 · 2018-08-04T18:40:35.407

The most memory-efficient method to format the cells after parsing is through generators. Something like:

with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        yield (cell.strip() for cell in row)

But it may be worth moving it to a function that you can use to keep munging and to avoid forthcoming iterations. For instance:

nulls = {'NULL', 'null', 'None', ''}

def clean(reader):
    def clean(row):
        for cell in row:
            cell = cell.strip()
            yield None if cell in nulls else cell

    for row in reader:
        yield clean(row)

Or it can be used to factorize a class:

def factory(reader):
    fields = next(reader)

    def clean(row):
        for cell in row:
            cell = cell.strip()
            yield None if cell in nulls else cell

    for row in reader:
        yield dict(zip(fields, clean(row)))

score 3 · Answer 6 · answered May 30 '14 at 14:41

You can create a wrapper object around your file that strips away the spaces before the CSV reader sees them. This way, you can even use the csv file with cvs.DictReader.

import re

class CSVSpaceStripper:
  def __init__(self, filename):
    self.fh = open(filename, "r")
    self.surroundingWhiteSpace = re.compile("\s*;\s*")
    self.leadingOrTrailingWhiteSpace = re.compile("^\s*|\s*$")

  def close(self):
    self.fh.close()
    self.fh = None

  def __iter__(self):
    return self

  def next(self):
    line = self.fh.next()
    line = self.surroundingWhiteSpace.sub(";", line)
    line = self.leadingOrTrailingWhiteSpace.sub("", line)
    return line

Then use it like this:

o = csv.reader(CSVSpaceStripper(filename), delimiter=";")
o = csv.DictReader(CSVSpaceStripper(filename), delimiter=";")

I hardcoded ";" to be the delimiter. Generalising the code to any delimiter is left as an exercise to the reader.

Expanding this solution to the general case would end up in a re-write of the `csv` module. — CivFan, Sep 09 '15 at 16:00

score 2 · Answer 7 · answered May 02 '18 at 10:49

Read a CSV (or Excel file) using Pandas and trim it using this custom function.

#Definition for strippping whitespace
def trim(dataset):
    trim = lambda x: x.strip() if type(x) is str else x
    return dataset.applymap(trim)

You can now apply trim(CSV/Excel) to your code like so (as part of a loop, etc.)

dataset = trim(pd.read_csv(dataset))
dataset = trim(pd.read_excel(dataset))

score 0 · Answer 8 · answered Feb 22 '21 at 10:37

and here is Daniel Kullmann excellent solution adapted to Python3:

import re

class CSVSpaceStripper:
    """strip whitespaces around delimiters in the file
    NB has hardcoded delimiter ";"
    """

    def __init__(self, filename):
        self.fh = open(filename, "r")
        self.surroundingWhiteSpace = re.compile(r"\s*;\s*")
        self.leadingOrTrailingWhiteSpace = re.compile(r"^\s*|\s*$")

    def close(self):
        self.fh.close()
        self.fh = None

    def __iter__(self):
        return self

    def __next__(self):
        line = self.fh.readline()
        line = self.surroundingWhiteSpace.sub(";", line)
        line = self.leadingOrTrailingWhiteSpace.sub("", line)
        return line

score 0 · Answer 9 · answered Jul 26 '21 at 15:25

0

I figured out a very simple solution:

import csv

with open('filename.csv') as f:
  reader = csv.DictReader(f)
  rows = [ { k.strip(): v.strip() for k,v in row.items() } for row in reader ]

answered Jul 26 '21 at 15:25

timothyzhang

730
9
12

score 0 · Answer 10 · answered Aug 08 '22 at 13:02

0

The following code may help you:

import pandas as pd

aList = pd.read_csv(r'filename.csv', sep='\s*,\s*', engine='python')

answered Aug 08 '22 at 13:02

Alexander Martins

343
1
11

Strip white spaces from CSV file

10 Answers10

Linked

Related