-2

I'm trying to read a csv in pandas. My file starts like:

 Site,Tank ID,Product,Volume,Temperature,Dip Time
   aaa,bbb,....
   .....

I read it with:

df = pd.DataFrame()
    date_col = ['Dip Time']
    data = pd.read_csv(atg_path, delimiter=',', skiprows=[1], skipinitialspace=True,
                                   dayfirst=True,
                                   parse_dates=date_col)

Here it skips the first row data. But I need it.

If I use skiprows=[0], then I get errors on some columns, e.g. ValueError: 'Dip Time' is not in list

I don't know why? It shouldn't skip any of the data. What is wrong?

smci
  • 32,567
  • 20
  • 113
  • 146
Ratha
  • 9,434
  • 17
  • 85
  • 163
  • 1
    Do you want to skip reading the **header**, or the **first row of data** (*"aaa,bbb,..."*)? What are you actually trying to achieve with `skiprows=[0]`? Your question is unclear. – smci Oct 04 '19 at 05:25
  • `skiprows = 0` (integer) means *"don't skip any rows"*, so it has no effect. Whereas `skiprows = [0]` (list with one element, 0) means *"skip the 0'th row, i.e. the header row"*, so it skips the header (with column names) and reads in the data. – smci Oct 04 '19 at 05:28
  • The [`pandas.read_csv()` doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=skiprows) explains what `skiprows` does, both as an integer and as a list – smci Oct 04 '19 at 07:02

1 Answers1

0

I think parameter skiprows here is not necessary, you can omit it.

But if pass 0 values it means don't skip any rows:

skiprows=0

import pandas as pd
from io import StringIO

temp="""Site,Tank ID,Product,Volume,Temperature,Dip Time
aaa,bbb,ccc,ddd,eee,fff
a,b,c,d,e,f
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp))
print (df)
  Site Tank ID Product Volume Temperature Dip Time
0  aaa     bbb     ccc    ddd         eee      fff
1    a       b       c      d           e        f

temp="""Site,Tank ID,Product,Volume,Temperature,Dip Time
aaa,bbb,ccc,ddd,eee,fff
a,b,c,d,e,f
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), skiprows=0)
print (df)
  Site Tank ID Product Volume Temperature Dip Time
0  aaa     bbb     ccc    ddd         eee      fff
1    a       b       c      d           e        f

But if pass [0] it means remove first row of file, here header, it means "skip the 0'th row, i.e. the headed row:

temp="""Site,Tank ID,Product,Volume,Temperature,Dip Time
aaa,bbb,ccc,ddd,eee,fff
a,b,c,d,e,f
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), skiprows=[0])
print (df)
  aaa bbb ccc ddd eee fff
0   a   b   c   d   e   f
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks I skipped that parameter now works fine. Can you give bit more expalnation why skiprows[0] gave me an issue and why that parameter used for? – Ratha Oct 04 '19 at 05:15
  • 1
    This isn't an answer. `skiprows = 0` (integer) means *"don't skip any rows"*, so it has no effect. Whereas `skiprows = [0]` (list with one element, 0) means *"skip the 0'th row, i.e. the header row"*, so it skips the header (with column names) and reads in the data. – smci Oct 04 '19 at 05:27
  • @smci - Sorry, you are right. Not clearly written, answer was edited. thank you. – jezrael Oct 04 '19 at 05:57