I'm trying to set up a Python script that will be able to read in many fixed width data files and then convert them to csv. To do this I'm using pandas like this:
pandas.read_fwf('source.txt', colspecs=column_position_length).\
to_csv('output.csv', header=column_name, index=False, encoding='utf-8')
Where column_position_length
and column_name
are lists containing the information needed to read and write the data.
Within these files I have long strings of numbers representing test answers. For instance: 333133322122222223133313222222221222111133313333
represents the correct answers on a multiple choice test. So this is more of a code than a numeric value. The problem that I am having is pandas interpreting these values as floats and then writing these values in scientific notation into the csv (3.331333221222221e+47).
I found a lot of questions regarding this issue, but they didn't quite resolve my issue.
- Solution 1 - I believe at this point the values have already been converted to floats so this wouldn't help.
- Solution 2 - according to the pandas documentation,
dtype
is not supported as an argument forread_fwf
in Python. - Solution 3 Use converters - the issue with using converters is that you need to specify the column name or index to convert to a data type, but I would like to read all of the columns as strings.
The second option seemes to be the go to answer for reading every column in as a string, but unfortunately it just isn't supported for read_fwf
. Any suggestions?