read_csv shifting column headers

Question

I am trying to read in a comma separated text file into Python with read_csv. However, Python is taking the header and shifting it over to the right by one.

Data file example with less columns than I actually have: (example file with more data: https://www.dropbox.com/s/5glujwqux6d0msh/test.txt?dl=0)

DAY,TIME,GENVEG,LATI,LONGI,AREA,CHEM
 226,  1200,     2,   -0.5548999786D+01,    0.3167600060D+02,    0.1000000000D+07, NaN
 226,  1115,     2,   -0.1823500061D+02,    0.3668500137D+02,    0.1000000000D+07, NaN

If I try the following (where infile_fire is the above txt file):

df_fires = pd.read_csv(infile_fire,sep="\,",skipinitialspace=True,engine='python')

I get this below. As you can see, DAY is actually above what should be the TIMEcolumn. (Note that the value in the AREA column comes from data I have in the larger dataset which isn't shown in the sample subset above)

I also tried df_fires = pd.read_csv(infile_fire).reset_index(), and though it does create a new index (as I'd like it to do), it also moves the 226 column over and names it index instead of DAY as it should.

I've also tried the following, but still got the same result (shifted headers)

df = pd.read_csv(infile_fire)

df = pd.read_csv(infile_fire,index_col=None)

df = pd.read_csv(infile_fire,index_col=0)

How can I fix this? I just want to read in the text file and have Python set up a new index and keep the headers as is.

That's because the sample has less data than my actual data set, which has over 40 columns. I just put a sample up above for reference. — SugaKookie, Feb 26 '19 at 00:53
Cannot reproduce as is. The index column contains row numbers if I use `read_csv()` the same way you did against exactly the csv data you've presented. — alecxe, Feb 26 '19 at 00:53
My point is to make the examples consistent so it is clear what you want. — cs95, Feb 26 '19 at 00:54

SugaKookie · Accepted Answer · 2019-02-26T17:31:02.400

1

Setting index to False solves this issue.

df = pd.read_csv(infile_fire,index_col=False)

edited Feb 26 '19 at 17:31

answered Feb 26 '19 at 17:12

SugaKookie

780
2
17
41

Dyno Fu · Answer 2 · 2019-02-26T01:01:52.977

0

without fiddling with the options, like pandas just does the right thing, see the sep in the doc of read_csv and csv.Sniffer.

from io import StringIO

import pandas as pd

data = """
DAY,TIME,GENVEG,LATI,LONGI,AREA
 226,  1200,     2,   -0.5548999786D+01,    0.3167600060D+02,    0.1000000000D+07
 226,  1115,     2,   -0.1823500061D+02,    0.3668500137D+02,    0.1000000000D+07
"""

df = pd.read_csv(StringIO(data))
df

edited Feb 26 '19 at 01:01

answered Feb 26 '19 at 00:54

Dyno Fu

8,753
4
39
64

I need to read in the data as a `.txt` file because this has to eventually be used as a function – SugaKookie Feb 26 '19 at 01:33
use stringio is just for illustration purpose, the point is the read_csv will use csv.Sniffer and it will figure it out what is the dialect of your csv is. just don't provide the extra options like sep etc. you need to replace the StringIO with your own file handle of couse. – Dyno Fu Feb 26 '19 at 01:54

score 0 · Answer 3 · answered Feb 26 '19 at 01:58

As file.txt beeing your file that you want to read.

file.txt = """
    DAY,TIME,GENVEG,LATI,LONGI,AREA
     226,  1200,     2,   -0.5548999786D+01,    0.3167600060D+02,    0.1000000000D+07
     226,  1115,     2,   -0.1823500061D+02,    0.3668500137D+02,    0.1000000000D+07
    """

Using:

import pandas as pd

Read the file:

df = pd.read_csv('file.txt')

If you take a look at your df.AREA[0], it will be something like this:

'    0.1000000000D+07'

Use regular expressions to remove blank spaces:

df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)

If you try to call your df now, the result will be:

   DAY  TIME  GENVEG               LATI             LONGI              AREA
0  226  1200       2  -0.5548999786D+01  0.3167600060D+02  0.1000000000D+07
1  226  1115       2  -0.1823500061D+02  0.3668500137D+02  0.1000000000D+07

So, your df.AREA[0] will be somthing like this:

'0.1000000000D+07'

Just like the others, for example: df.LATI[0]

'-0.5548999786D+01'

read_csv shifting column headers

3 Answers3

Linked