How to load space separate file into pandas dataframe?

Question

I want to load a space separated data into pandas dataframe. If I use sep='\s+', then I get the error CParserError: Error tokenizing data. C error: Expected 7 fields in line 5, saw 9

df = pd.read_table("data.rpt",sep='\s+',index_col=False)

I was able to open this file in Excel using space as a delimiter. How to solve this issue with pandas?

Fist lines of a file:

Id IdEvent  Agent   Sist  Group   Con CInt
-- -------  -----   ----  -----   --- ----
18 2016101  B0C     XCX   ROD F   DC  0
19 2016101  A0C     DCX   APT     AD  5
15 2016103  V0C     XCX   ROD S   DC  0
16 2016102  N0C     XCX   ROD     CD  0

What does line 5 of your data look like? From the error, it seems like it has more spaces than the previous lines do. Perhaps one of the fields contains a value with a space in it. — root, Feb 13 '17 at 17:24

score 1 · Answer 1 · answered Feb 13 '17 at 16:27

1

Add delim_whitespace=True as an argument.

answered Feb 13 '17 at 16:27

Thomas Lehoux

1,158
9
13

I tried it as well. Still getting the error `CParserError: Error tokenizing data. C error: Expected 7 fields in line 5, saw 9` – Dinosaurius Feb 13 '17 at 16:29
Does your file has a header ? – Thomas Lehoux Feb 13 '17 at 16:32
yes, it has a header and it looks like the next like after the header is `--- -------------- ---` (something like this). – Dinosaurius Feb 13 '17 at 16:34
I would need to set something like: `header=True` and `skip_lines=0`, which means that the first line is header and the next one should be skipped. Is it possible? – Dinosaurius Feb 13 '17 at 16:37
IMO the best way is to skip the first rows composing the header (with skiprows argument). – Thomas Lehoux Feb 13 '17 at 16:39
I am putting these parameters `delim_whitespace=True,skiprows=0,header=1,index_col=False`, but still the same error. – Dinosaurius Feb 13 '17 at 17:03
try setting `skiprows=5`if 5 is the line at which you have the `--- -------------- ---` – Thomas Lehoux Feb 13 '17 at 17:27
Can I pass the list of rows? How to do it? – Dinosaurius Feb 13 '17 at 17:46
Do you have a fixed header ? Can you edit your question and post your header ? – Thomas Lehoux Feb 13 '17 at 17:48
Yes, I posted the first lines from a file. – Dinosaurius Feb 13 '17 at 17:51
Try this : `df = pd.read_csv("data.rpt",delim_whitespace=True, skiprows=2)` – Thomas Lehoux Feb 13 '17 at 17:56
I think that the problem is that some entries have a content like `Done ABC AR`. I mean, this includes a space. However, why that I can open it in Excel by setting Delimiter = Space? – Dinosaurius Feb 13 '17 at 18:03
I made some prerocessing in VIM and now I can load it into pandas. – Dinosaurius Feb 13 '17 at 18:11

score 0 · Accepted Answer · answered Feb 13 '17 at 20:14

Use read_fwf() method:

In [125]: pd.read_fwf(fn, skiprows=[1])
Out[125]:
   Id  IdEvent Agent Sist  Group Con  CInt
0  18  2016101   B0C  XCX  ROD F  DC     0
1  19  2016101   A0C  DCX    APT  AD     5
2  15  2016103   V0C  XCX  ROD S  DC     0
3  16  2016102   N0C  XCX    ROD  CD     0

How to load space separate file into pandas dataframe?

2 Answers2

Linked