1

I am complete newbie for programming and this is my first real program I am trying to write.

So I have this huge CSV file (hundreds of cols and thousands of rows) where I am trying to extract only few columns based on value in the field. It works fine and I get nice output, but the problem arises when I am try to encapsulate the same logic in a function. it returns only first extracted row however print works fine.

I have been playing for this for hours and read other examples here and now my mind is mush.

import csv
import sys

newlogfile = csv.reader(open(sys.argv[1], 'rb'))
outLog = csv.writer(open('extracted.csv', 'w'))

def rowExtractor(logfile):
    for row in logfile:
        if row[32] == 'No':
            a = []
            a.append(row[44])
            a.append(row[58])
            a.append(row[83])
            a.append(row[32])
            return a

outLog.writerow(rowExtractor(newlogfile))
b1ackzer0
  • 165
  • 1
  • 1
  • 5
  • Could it be you're resetting your list to empty with each new row? What happens if you put the a = [] before the for? Also, the return will terminate you right after the first row as indicated by user457188. I'd place the return after for (same indentation). – octopusgrabbus Mar 22 '12 at 16:30
  • please show when and how you call the rowExtractor function and where you put the print statement that works – Gil.I Mar 22 '12 at 16:31
  • @Gil.I if the print statement was where the return statement is it would print out all of the rows, because print won't break out of the function. I think that's what he's saying he did. – dave Mar 22 '12 at 16:36
  • @Gil.I, exactly what dave said :) – b1ackzer0 Mar 22 '12 at 16:45

2 Answers2

1

You are exiting prematurely. When you put return a inside the for loop, return gets called on the first iteration. Which means that only the firs iteration runs.

A simple way to do this would be to do:

def rowExtractor(logfile):
    #output holds all of the rows
    ouput = []
    for row in logfile:
        if row[32] == 'No':
            a = []
            a.append(row[44])
            a.append(row[58])
            a.append(row[83])
            a.append(row[32])
            output.append(a)
    #notice that the return statement is outside of the for-loop
    return output
outLog.writerows(rowExtractor(newlogfile))

You could also consider using yield

dave
  • 12,406
  • 10
  • 42
  • 59
1

You've got a return statement in your function...when it hits that line, it will return (thus terminating your loop). You'd need yield instead.

See What does the "yield" keyword do in Python?

Community
  • 1
  • 1
mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • I think he'd also need to switch to `.writerows`, but return->yield and adding the s should make the code work. – DSM Mar 22 '12 at 16:32
  • @DSM: Yes. It was more of a nudge in the right direction so he could see where the problem was. – mpen Mar 22 '12 at 16:33