Python CSV reader start at line_num

Question

I need to read a CSV with a couple million rows. The file grows throughout the day. After each time I process the file (and zip each row into a dict), I start the process over again, except creating the dict only for the new lines.

In order to get to the new lines though, I have to iterate over each line with CSV reader and compare the line number to my 'last line read' number (as far as I know).

Is there a way to just 'skip' to that line number?

score 2 · Answer 1 · edited May 23 '17 at 11:45

2

You can't go to a specific line number, unless the size of a line is fixed and you know this size. When I say you can't, I mean you can't without loading the whole file in memory and split by \n character.

If your CSV has a fixed-line size like this:

id,code,quantity
0001,ABC43,00100
0002,D2ZAD,00020
....

where each line has the same length, then you could move to linesize*(linenumber+1), where linenumber is the line you want to go.
Otherwise, you need to loop through the whole file to get the n-th line... It exists a built-in module, name linecache which can help you however: Go to a specific line in Python?

edited May 23 '17 at 11:45

Community

1
1

answered Feb 13 '14 at 18:13

Maxime Lorant

34,607
19
87
97

Thank you very much for this helpful response. I will try linecache this afternoon. – 10mjg Feb 13 '14 at 19:59
I'm a little curious as to how to proceed once I use linecache to get to the specific line. – 10mjg Feb 13 '14 at 22:32
I don't really know how works `linecache` internally. You could iterate your every line by getting `linecache.getline(filename, n)` with `n` from `linenumber`, and stops when it returns an empty string (means the line doesn't exists according to the doc). Check performance, but the doc says that `linecache` manage an internal cache, so it should be fine. – Maxime Lorant Feb 13 '14 at 22:36
I'm imagining a use for linecache where I could instruct it to grab all lines from a specific line to the end of the file (or a fixed number of lines, say, 20,000 at a time). If linecache can only grab one line at a time, I think it won't lead to an easy or elegant solution. I am going to continue researching obviously... Thank you... – 10mjg Feb 13 '14 at 22:50

score 0 · Answer 2 · answered Feb 13 '14 at 18:14

0

If I were doing this I think I would add a marker line after each read - before the file is saved again , then I would read the file in as a string , split on the marker, convert back to a list and feed the list to the process.

answered Feb 13 '14 at 18:14

PyNEwbie

4,882
4
38
86

Python CSV reader start at line_num

2 Answers2