141

I try to read the file into pandas. The file has values separated by space, but with different number of spaces I tried:

pd.read_csv('file.csv', delimiter=' ')

but it doesn't work

smci
  • 32,567
  • 20
  • 113
  • 146
yemu
  • 26,249
  • 10
  • 32
  • 29
  • 4
    Possible duplicate of [How to make separator in read\_csv more flexible wrt whitespace?](http://stackoverflow.com/questions/15026698/how-to-make-separator-in-read-csv-more-flexible-wrt-whitespace) – e4c5 Jan 18 '17 at 12:44
  • result = pd.read_table('file.csv', sep='\s+') – Mohamed Fathallah Dec 29 '22 at 03:02

5 Answers5

233

add delim_whitespace=True argument, it's faster than regex.

HYRY
  • 94,853
  • 25
  • 187
  • 187
  • 5
    should add that, and remove `delimiter=' '` as they are mutually exclusive in recent versions. – matanster Aug 08 '18 at 13:05
  • 10
    @matanster: `delimiter=' '` is very brittle, it says to expect one and only one space. No tabs, newsline, multiple spaces, nonbreaking whitespaces, combination of these etc. `delimiter='\s+'` is what pandas recommends and is more robust. – smci Jan 16 '20 at 12:28
  • `sep="\s+"` argument also works – PJ_ May 26 '22 at 13:55
52

you can use regex as the delimiter:

pd.read_csv("whitespace.csv", header=None, delimiter=r"\s+")
  • 5
    This helps when you have more than just a space as delimiter. In current versions one should add `engine = "python"` to avoid a warning. – Jürg W. Spaak Mar 20 '18 at 09:45
  • 2
    Sorry for commenting old reply here, what does `r` before `"\s+"` mean? – AlphaF20 Sep 01 '21 at 04:23
  • 1
    @AlphaF20 it means read as raw string literal: https://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-prefixes-do-and-what-are-raw-string-literals – PJ_ May 26 '22 at 13:57
2

Pandas read_fwf for the win:

import pandas as pd

df = pd.read_fwf(file_path)
erickfis
  • 1,074
  • 13
  • 19
2

You can pass a regular expression as a delimiter for read_table also, and it is fast :).

result = pd.read_table('file', sep='\s+')
Mohamed Fathallah
  • 1,274
  • 1
  • 15
  • 17
0

If you can't get text parsing to work using the accepted answer (e.g if your text file contains non uniform rows) then it's worth trying with Python's csv library - here's an example using a user defined Dialect:

 import csv

 csv.register_dialect('skip_space', skipinitialspace=True)
 with open(my_file, 'r') as f:
      reader=csv.reader(f , delimiter=' ', dialect='skip_space')
      for item in reader:
          print(item)
Pierz
  • 7,064
  • 52
  • 59