1

I am trying to use the skiprows in pandas pd.read_csv function, however when I use it I only get that specific row from where it should start.

I.e. skiprows=1, will ONLY output row 1.

Any idea why this is happening.

I am loading the CSV file in like this:

 df = pd.read_csv(filename, usecols=cols,skiprows=4,nrows=10)

The following is a snapshot of my df, which as per the above example should give me Canada-Italy:

    iyear         country_txt      y       x          count code
0    1970           Argentina      4       0            15  ARG
1    1970           Australia      0       0             0  AUS
2    1970             Belgium      0       0             0  BEL
3    1970              Brazil      2       0             5  BRA
4    1970              Canada      1       0             1  CAN
5    1970            Colombia      0       0             0  COL
6    1970  Dominican Republic      1       0             1  DOM
7    1970             Germany      0       0             9  DEU
8    1970               Egypt      0       2             0  EGY
9    1970              Greece      2       0             2  GRC
10   1970           Guatemala      2       0             2  GTM
11   1970                Iran      0       0             4  IRN
12   1970             Ireland      1       0             0  IRL
13   1970              Israel      1       0             0  ISR
14   1970               Italy      0       0             1  ITA
15   1970               Japan      0       0             1  JPN
16   1970              Jordan      1       0             6  JOR
17   1970             Lebanon      0       0             0  LBN
18   1970              Mexico      0       0             1  MEX
19   1970         Netherlands      0       0             1  NLD
20   1970           Nicaragua      0       0             0  NIC
21   1970            Pakistan      1       0             0  PAK
22   1970            Paraguay      1       1             0  PRY
23   1970         Philippines      5       0             7  PHL
24   1970               Spain      0       0             2  ESP
25   1970         Switzerland     47       0             2  CHE
26   1970              Turkey      0       0             9  TUR
27   1970      United Kingdom     20       0             7  GBR
28   1970       United States     33       0           364  USA
29   1970             Uruguay      2       0            22  URY
30   1970           Venezuela      0       0             1  VEN

However the output is the following:

Empty DataFrame
Columns: [1970, Brazil, 2, 0, 5, BRA]
Index: []
no nein
  • 631
  • 3
  • 10
  • 27
  • 1
    can you post a small __reproducible__ data set (CSV)? Does your CSV file have more than 5 rows? – MaxU - stand with Ukraine Sep 16 '17 at 16:12
  • Yeah my dataset has around 3000 rows. Can it be because I am reading the same file multiple times? i.e. first i read the first 4-14 rows then row 15-25 etc so I have a for loop where the skiprows is constantly incremented by 10. However I doubt it, because if I just take the file without a loop and use a skiprows=4 it still doesnt work. – no nein Sep 16 '17 at 16:23
  • I am making a picture over 10 rows, I just found the solution to be easy. Whats the difference between chunksize and skiprows? – no nein Sep 16 '17 at 16:27
  • We shouldn't be guessing - please show us a sample reproducible data set. Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. – MaxU - stand with Ukraine Sep 16 '17 at 16:28
  • Please see the edited post for a snapshow – no nein Sep 16 '17 at 16:34
  • and what is `print('cols')`? – MaxU - stand with Ukraine Sep 16 '17 at 16:37
  • ['iyear', 'country_txt', 'y', 'x', 'count', 'code'] – no nein Sep 16 '17 at 16:52

1 Answers1

2

I think you want to use the iterator=True parameter:

Demo:

reader = pd.read_csv(fn, iterator=True)

# read first 5 rows
df = reader.get_chunk(5)

# read next 10 rows
df = reader.get_chunk(10)

# read next 12 rows
df = reader.get_chunk(12)

and so on..

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419