1

In Python(2.7.6) I am trying to get lists out of some columns of a CSV file. Given the CSV file test.csv with the content:

COL_ONE,COL_TWO,COL_THREE
val_R1C1,val_R1C2,val_R1C3
val_R2C1,val_R2C2,val_R2C3
val_R3C1,val_R3C2,val_R3C3
val_R4C1,val_R4C2,val_R4C3

I expect the following code to do this for me:

import csv

reader = csv.DictReader(open("test.csv", "r"))
col2 = list(c2['COL_TWO'] for c2 in reader)
col3 = list(c3['COL_THREE'] for c3 in reader)

Unfortunately, when I print the two lists, col2 and col3, the second list is empty.

['val_R1C2', 'val_R2C2', 'val_R3C2', 'val_R4C2']
[]

This alternative has the same result:

reader = csv.DictReader(open("test.csv", "r"))
col2 = []
for c2 in reader:
    col2.append(c2['COL_TWO'])
col3 = []
for c3 in reader:
    col3.append(c3['COL_THREE'])

The workaround is easy:

col2 = []
col3 = []
for cval in reader:
    col2.append(cval['COL_TWO'])
    col3.append(cval['COL_THREE'])

I get what I would have expected in the previous two examples:

['val_R1C2', 'val_R2C2', 'val_R3C2', 'val_R4C2']
['val_R1C3', 'val_R2C3', 'val_R3C3', 'val_R4C3']

I would appreciate some help to understand what I am doing wrong. Why am I not getting the same results in all three cases?

1 Answers1

0

Rewind the underlying file to the 0 position in between reads:

import csv

fh = open("test.csv", "r")
reader = csv.DictReader(fh)
next(reader) # skip header row
col2 = list(c2['COL_TWO'] for c2 in reader)
fh.seek(0)  ## <------------------- this is the point
next(reader) # skip header row
col3 = list(c3['COL_THREE'] for c3 in reader)
logc
  • 3,813
  • 1
  • 18
  • 29
  • When I run this code and then print col2 and col3 I get: ['val_R1C2', 'val_R2C2', 'val_R3C2', 'val_R4C2'] ['COL_THREE', 'val_R1C3', 'val_R2C3', 'val_R3C3', 'val_R4C3'] – ChurlishPedant Mar 23 '15 at 00:38
  • You are not showing at least a part of the code where you are skipping the first row of the file; I am going to edit my answer to provide for this, but please look into your code where this happens ... – logc Mar 23 '15 at 10:11
  • logc, thanks for your explanation. Actually, the code I have shown is complete save for two print statements at the end. The code as you show it gives me `['val_R2C2', 'val_R3C2', 'val_R4C2']` `['val_R1C3', 'val_R2C3', 'val_R3C3', 'val_R4C3']` If I remove the first `next` statement, I get the desired result. – ChurlishPedant Mar 23 '15 at 11:57