2

I want to load a space separated data into pandas dataframe. If I use sep='\s+', then I get the error CParserError: Error tokenizing data. C error: Expected 7 fields in line 5, saw 9

df = pd.read_table("data.rpt",sep='\s+',index_col=False)

I was able to open this file in Excel using space as a delimiter. How to solve this issue with pandas?

Fist lines of a file:

Id IdEvent  Agent   Sist  Group   Con CInt
-- -------  -----   ----  -----   --- ----
18 2016101  B0C     XCX   ROD F   DC  0
19 2016101  A0C     DCX   APT     AD  5
15 2016103  V0C     XCX   ROD S   DC  0
16 2016102  N0C     XCX   ROD     CD  0
Dinosaurius
  • 8,306
  • 19
  • 64
  • 113
  • You can parse it with regex. Include the file if you can. – Mohammad Yusuf Feb 13 '17 at 16:47
  • What does line 5 of your data look like? From the error, it seems like it has more spaces than the previous lines do. Perhaps one of the fields contains a value with a space in it. – root Feb 13 '17 at 17:24

2 Answers2

1

Add delim_whitespace=True as an argument.

Thomas Lehoux
  • 1,158
  • 9
  • 13
0

Use read_fwf() method:

In [125]: pd.read_fwf(fn, skiprows=[1])
Out[125]:
   Id  IdEvent Agent Sist  Group Con  CInt
0  18  2016101   B0C  XCX  ROD F  DC     0
1  19  2016101   A0C  DCX    APT  AD     5
2  15  2016103   V0C  XCX  ROD S  DC     0
3  16  2016102   N0C  XCX    ROD  CD     0
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419