2

I am reading in a fixed-width file with read_fwf, and I need white space to be preserved, as the form uses blanks as a response type. I have looked for a keyword/parameter that changes the default settings which strip this white space, but I'm not sure if one exists. I saw the exact same question on here, which was not resolved until the user solved this issue, but not with pandas read_fwf

After calling read_fwf into a df, I tried padding the beginnings/ends of my strings with extra characters, but it didn't solve the problem that information was being lost due to the white space stripping in the first place.

df = pd.read_fwf(file, widths=widths)

  • There's a section on Files with Fixed Width Columns [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html). Not sure if you saw that? – run-out Jul 12 '19 at 18:52
  • 1
    It says that the parser "takes care of white space", but I want it to leave excess white space if certain rows do not reach the width limits. I saw that this problem was discussed on the pandas git repo, https://github.com/pandas-dev/pandas/issues/16772, but I couldn't figure out if the commits made fixed the problem – dylanbking97 Jul 12 '19 at 20:05
  • It looks like they just updated the documentation to reference the whitespace behaviour. It was suggested in the comments that an option be added to allow for keeping whitespaces, but this was not completed, at least not in this thread on Git. – run-out Jul 12 '19 at 21:39
  • Ah I see. Thanks for the help! – dylanbking97 Jul 14 '19 at 00:43

1 Answers1

0

This worked for me:

pd.read_fwf(file,header=None,colspecs=[(0,5000)],delimiter="\n\t")

Addressed in Issue #16772 https://github.com/alanbato/pandas/commit/ad1d3a1688fd489404e91ecc0017c2abc1a322a4

Charles
  • 439
  • 4
  • 18