Difficult archive format to read with pandas: Dephased output

Question

I have the following arquive format:

 7.2393690416406E+000 1.0690994646755E+001 3.1429089063731E+000
-2.7606309583594E+000 1.0690994646755E+001 1.3142908906373E+001

That is: Before non-negative values (talking about first column), there is one white space, and before negative values there is not white spaces. Therefore, if you read with a code like the following:

df = pd.read_csv('example.csv',header=None,engine='python',sep=' ')

You will get something like this:

1           NaN   7.239369  10.690995   3.142909
2     -2.760631  10.690995  13.142909        NaN

This happens because pandas identifies the first white space, and assumes it is a column. The dataframe indeed contains all values, but each negative line (talking about the first column) will be dephased by one column. How can I fix it? How can a get a pretty dataframe like the folliwing?

1      7.239369  10.690995   3.142909
2     -2.760631  10.690995  13.142909

Does this answer your question? [How to make separator in pandas read\_csv more flexible wrt whitespace, for irregular separators?](https://stackoverflow.com/questions/15026698/how-to-make-separator-in-pandas-read-csv-more-flexible-wrt-whitespace-for-irreg) — AMC, May 03 '20 at 00:23

score 2 · Accepted Answer · answered May 02 '20 at 22:44

2

Use sep='\s+'

df = pd.read_csv('test.csv', header=None, sep='\s+')

          0          1          2
0  7.239369  10.690995   3.142909
1 -2.760631  10.690995  13.142909

answered May 02 '20 at 22:44

Trenton McKinney

56,955
33
144
158

Difficult archive format to read with pandas: Dephased output

1 Answers1