I have the two following functions to extract data from a csv file, one returns a list and the other a generator:
List:
def data_extraction(filename,start_line,node_num,span_start,span_end):
with open(filename, "r") as myfile:
file_= csv.reader(myfile, delimiter=',') #extracts data from .txt as lines
return [filter(lambda a: a != '', row[span_start:span_end]) \
for row in itertools.islice(file_, start_line, node_num+1)]
Generator:
def data_extraction(filename,start_line,node_num,span_start,span_end):
with open(filename, "r") as myfile:
file_= csv.reader(myfile, delimiter=',') #extracts data from .txt as lines
return (itertools.ifilter(lambda a: a != '', row[span_start:span_end]) \
for row in itertools.islice(file_, start_line, node_num+1))
I start my program by a call to one of the following functions to extract the data.
The next line is: print [x in data]
When I use the function which returns a list it all works fine, when I use the generator I get : ValueError: I/O operation on closed file
I gathered from other questions that it was due to the fact that the with open
statement was probably lost once my data_extraction
function returns
.
The question is: Is there a workaround to be able to keep an independent function to extract the data so that I don't have to put all my code inside one function ? And secondly will I be able to reset the generator to use it multiple times ?
the reason for wanting to keep the generator over the list is that I am dealing with large datasets.