0

I am trying to read in a comma separated text file into Python with read_csv. However, Python is taking the header and shifting it over to the right by one.

Data file example with less columns than I actually have: (example file with more data: https://www.dropbox.com/s/5glujwqux6d0msh/test.txt?dl=0)

DAY,TIME,GENVEG,LATI,LONGI,AREA,CHEM
 226,  1200,     2,   -0.5548999786D+01,    0.3167600060D+02,    0.1000000000D+07, NaN
 226,  1115,     2,   -0.1823500061D+02,    0.3668500137D+02,    0.1000000000D+07, NaN

If I try the following (where infile_fire is the above txt file):

df_fires = pd.read_csv(infile_fire,sep="\,",skipinitialspace=True,engine='python')

I get this below. As you can see, DAY is actually above what should be the TIMEcolumn. (Note that the value in the AREA column comes from data I have in the larger dataset which isn't shown in the sample subset above) enter image description here

I also tried df_fires = pd.read_csv(infile_fire).reset_index(), and though it does create a new index (as I'd like it to do), it also moves the 226 column over and names it index instead of DAY as it should. enter image description here

I've also tried the following, but still got the same result (shifted headers)

df = pd.read_csv(infile_fire)

df = pd.read_csv(infile_fire,index_col=None)

df = pd.read_csv(infile_fire,index_col=0)

How can I fix this? I just want to read in the text file and have Python set up a new index and keep the headers as is.

SugaKookie
  • 780
  • 2
  • 17
  • 41

3 Answers3

1

Setting index to False solves this issue.

df = pd.read_csv(infile_fire,index_col=False)

SugaKookie
  • 780
  • 2
  • 17
  • 41
0

without fiddling with the options, like pandas just does the right thing, see the sep in the doc of read_csv and csv.Sniffer.

from io import StringIO

import pandas as pd

data = """
DAY,TIME,GENVEG,LATI,LONGI,AREA
 226,  1200,     2,   -0.5548999786D+01,    0.3167600060D+02,    0.1000000000D+07
 226,  1115,     2,   -0.1823500061D+02,    0.3668500137D+02,    0.1000000000D+07
"""

df = pd.read_csv(StringIO(data))
df

enter image description here

Dyno Fu
  • 8,753
  • 4
  • 39
  • 64
  • I need to read in the data as a `.txt` file because this has to eventually be used as a function – SugaKookie Feb 26 '19 at 01:33
  • use stringio is just for illustration purpose, the point is the read_csv will use csv.Sniffer and it will figure it out what is the dialect of your csv is. just don't provide the extra options like sep etc. you need to replace the StringIO with your own file handle of couse. – Dyno Fu Feb 26 '19 at 01:54
0

As file.txt beeing your file that you want to read.

file.txt = """
    DAY,TIME,GENVEG,LATI,LONGI,AREA
     226,  1200,     2,   -0.5548999786D+01,    0.3167600060D+02,    0.1000000000D+07
     226,  1115,     2,   -0.1823500061D+02,    0.3668500137D+02,    0.1000000000D+07
    """

Using:

import pandas as pd

Read the file:

df = pd.read_csv('file.txt')

If you take a look at your df.AREA[0], it will be something like this:

'    0.1000000000D+07'

Use regular expressions to remove blank spaces:

df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)

If you try to call your df now, the result will be:

   DAY  TIME  GENVEG               LATI             LONGI              AREA
0  226  1200       2  -0.5548999786D+01  0.3167600060D+02  0.1000000000D+07
1  226  1115       2  -0.1823500061D+02  0.3668500137D+02  0.1000000000D+07

So, your df.AREA[0] will be somthing like this:

'0.1000000000D+07'

Just like the others, for example: df.LATI[0]

'-0.5548999786D+01'
Leonardo Ferreira
  • 385
  • 1
  • 3
  • 11