Python Read fixed width files without any data type interpretation using Pandas

Question

I'm trying to set up a Python script that will be able to read in many fixed width data files and then convert them to csv. To do this I'm using pandas like this:

pandas.read_fwf('source.txt', colspecs=column_position_length).\
         to_csv('output.csv', header=column_name, index=False, encoding='utf-8')

Where column_position_length and column_name are lists containing the information needed to read and write the data.

Within these files I have long strings of numbers representing test answers. For instance: 333133322122222223133313222222221222111133313333 represents the correct answers on a multiple choice test. So this is more of a code than a numeric value. The problem that I am having is pandas interpreting these values as floats and then writing these values in scientific notation into the csv (3.331333221222221e+47).

I found a lot of questions regarding this issue, but they didn't quite resolve my issue.

Solution 1 - I believe at this point the values have already been converted to floats so this wouldn't help.
Solution 2 - according to the pandas documentation, dtype is not supported as an argument for read_fwf in Python.
Solution 3 Use converters - the issue with using converters is that you need to specify the column name or index to convert to a data type, but I would like to read all of the columns as strings.

The second option seemes to be the go to answer for reading every column in as a string, but unfortunately it just isn't supported for read_fwf. Any suggestions?

`dtype` _is_ supported, and yes, setting it to `object` would be the optimal solution. — DYZ, May 05 '17 at 18:11
dtype : Type name or dict of column -> type, default None Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} (Unsupported with engine=’python’). Use str or object to preserve and not interpret dtype. http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_fwf.html — dselgo, May 05 '17 at 18:24

score 2 · Accepted Answer · answered May 08 '17 at 14:37

So I think I figured out a solution, but I don't know why it works. Pandas was interpreting these values as floats because there were NaN values (blank lines) in the columns. By adding keep_default_na=False to the read_fwf() parameters, it resolved this issue. According to the documentation:

keep_default_na : bool, default True If na_values are specified and keep_default_na is False the default NaN values are overridden, otherwise they’re appended to.

I guess I'm not quite understanding how this is fixing my issue. Could anyone add any clarity on this?

Python Read fixed width files without any data type interpretation using Pandas

1 Answers1