How to read file with space separated values in pandas

Question

I try to read the file into pandas. The file has values separated by space, but with different number of spaces I tried:

pd.read_csv('file.csv', delimiter=' ')

but it doesn't work

Possible duplicate of [How to make separator in read\_csv more flexible wrt whitespace?](http://stackoverflow.com/questions/15026698/how-to-make-separator-in-read-csv-more-flexible-wrt-whitespace) — e4c5, Jan 18 '17 at 12:44

score 233 · Accepted Answer · answered Oct 28 '13 at 11:06

233

add delim_whitespace=True argument, it's faster than regex.

answered Oct 28 '13 at 11:06

HYRY

94,853
25
187
187

5

should add that, and remove `delimiter=' '` as they are mutually exclusive in recent versions. – matanster Aug 08 '18 at 13:05
10

@matanster: `delimiter=' '` is very brittle, it says to expect one and only one space. No tabs, newsline, multiple spaces, nonbreaking whitespaces, combination of these etc. `delimiter='\s+'` is what pandas recommends and is more robust. – smci Jan 16 '20 at 12:28
`sep="\s+"` argument also works – PJ_ May 26 '22 at 13:55

score 52 · Answer 2 · answered Oct 28 '13 at 10:16

52

you can use regex as the delimiter:

pd.read_csv("whitespace.csv", header=None, delimiter=r"\s+")

answered Oct 28 '13 at 10:16

5

This helps when you have more than just a space as delimiter. In current versions one should add `engine = "python"` to avoid a warning. – Jürg W. Spaak Mar 20 '18 at 09:45
2

Sorry for commenting old reply here, what does `r` before `"\s+"` mean? – AlphaF20 Sep 01 '21 at 04:23
1

@AlphaF20 it means read as raw string literal: https://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-prefixes-do-and-what-are-raw-string-literals – PJ_ May 26 '22 at 13:57

score 2 · Answer 3 · answered Nov 21 '22 at 14:23

2

Pandas read_fwf for the win:

import pandas as pd

df = pd.read_fwf(file_path)

answered Nov 21 '22 at 14:23

erickfis

1,074
13
19

score 2 · Answer 4 · answered Dec 29 '22 at 03:07

2

You can pass a regular expression as a delimiter for read_table also, and it is fast :).

result = pd.read_table('file', sep='\s+')

answered Dec 29 '22 at 03:07

Mohamed Fathallah

1,274
1
15
17

Pierz · Answer 5 · 2020-12-09T18:22:25.090

0

If you can't get text parsing to work using the accepted answer (e.g if your text file contains non uniform rows) then it's worth trying with Python's csv library - here's an example using a user defined Dialect:

 import csv

 csv.register_dialect('skip_space', skipinitialspace=True)
 with open(my_file, 'r') as f:
      reader=csv.reader(f , delimiter=' ', dialect='skip_space')
      for item in reader:
          print(item)

edited Dec 09 '20 at 18:22

answered May 21 '20 at 21:05

Pierz

7,064
52
59

That's not true. It works with python 3.8 and pandas. The question asks for reading a text file in pandas. – Spas Nov 30 '20 at 10:40
Ah sorry - I have updated my answer to account for this. – Pierz Dec 09 '20 at 18:23

How to read file with space separated values in pandas

5 Answers5

Linked

Related