Python scan file line by line and remove last line in the same loop

Question

I am trying to scan a csv file and make adjustments line by line. In the end, I would like to remove the last line. How can I remove the last line within the same scanning loop?

My code below reads from the original file, makes adjustments and finally writes to a new file.

import csv

raw_data = csv.reader(open("original_data.csv", "r"), delimiter=",")
output_data = csv.writer(open("final_data.csv", "w"), delimiter=",")
lastline = # integer index of last line

for i, row in enumerate(raw_data):
    if i == 10:
        # some operations
        output_data.writerow(row)
    elif i > 10 and i < lastline:
        # some operations
        output_data.writerow(row)
    elif i == lastline:
        output_data.writerow([])
    else:
        continue

Remove the last line from which file? The original input file? — , Jan 26 '15 at 16:25
If you want to remove that last line, why write it in the first place? — , Jan 26 '15 at 16:27
@Evert I start with a file called `original_data.csv`. I do not know how many lines are there. `final_data.csv` is the output file I created with all adjustments made. After all adjustments, I do not want to keep the last line from the original file. However, how do I know if a line is the last line? — Boxuan, Jan 26 '15 at 16:31
You might be able to simply slice `raw_data`: `for i, row in enumerate(raw_data[:-1]):`. — , Jan 26 '15 at 16:33
@AshwiniChaudhary I figured that now, seeing the answers. It was just a blind guess, but an iterator is more logical for the IO here. — , Jan 26 '15 at 17:11

score 4 · Accepted Answer · answered Jan 26 '15 at 16:32

You can make a generator to yield all elements except the last one:

def remove_last_element(iterable):
    iterator = iter(iterable)
    try:
        prev = next(iterator)
        while True:
            cur = next(iterator)
            yield prev
            prev = cur
    except StopIteration:
        return

Then you just wrap raw_data in it:

for i, row in enumerate(remove_last_element(raw_data)):
    # your code

The last line will be ignored automatically.

This approach has the benefit of only reading the file once.

score 2 · Answer 2 · edited May 23 '17 at 11:43

2

A variation of @Kolmar's idea:

def all_but_last(it):
    buf = next(it)
    for item in it:
        yield buf
        buf = item

for line in all_but_last(...):

Here's more generic code that extends islice (two-args version) for negative indexes:

import itertools, collections

def islice2(it, stop):
    if stop >= 0:
        for x in itertools.islice(it, stop):
            yield x
    else:
        d = collections.deque(itertools.islice(it, -stop))
        for item in it:
            yield d.popleft()
            d.append(item)


for x in islice2(xrange(20), -5):
    print x,

# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

edited May 23 '17 at 11:43

Community

1
1

answered Jan 26 '15 at 16:42

georg

211,518
52
313
390

That's a good modification, only mind that `itertools.islice` also has optional `start` and `step` parameters – Kolmar Jan 26 '15 at 17:03
@Kolmar: yeah, but I'm too lazy ;) `islice2(xs, 100, 0, -5)` sounds like a lot of work. – georg Jan 26 '15 at 17:12

ovgolovin · Answer 3 · 2015-01-26T16:42:58.243

1

You can iterate with window of size 2 and print only the first value in the window. This will lead to the last element being skipped:

from itertools import izip, tee

def pairwise(iterable):
    a, b = itertools.tee(iterable)
    next(b, None)
    return izip(a, b)

for row, _ in pairwise(raw_data):
    output_data.writerow(row)

output_data.writerow([])

edited Jan 26 '15 at 16:42

answered Jan 26 '15 at 16:37

ovgolovin

13,063
6
47
78

Worth noting that this method uses the fact that list multiplication copies references, thus the result is a pair of references to the same iterator, which gets incremented twice as fast with each loop. – Maciej Gol Jan 26 '15 at 16:39
@kroolik I have edited the answer. It should be `pairwice`, not grouper. Thanks! – ovgolovin Jan 26 '15 at 16:43

score 0 · Answer 4 · edited May 23 '17 at 12:20

0

An idea is to calculate the length of each line you iterate and then when coming to the last line truncate the file thus "shortening the file". Not sure if this is good practice though...

eg Python: truncate a file to 100 lines or less

edited May 23 '17 at 12:20

Community

1
1

answered Jan 26 '15 at 16:29

user1267259

761
2
10
22

score 0 · Answer 5 · answered Jan 26 '15 at 16:34

Instead of writing the current line each loop iteration, try writing the previously read line:

import csv

raw_data = csv.reader(open("original_data.csv", "r"), delimiter=",")
output_data = csv.writer(open("final_data.csv", "w"), delimiter=",")
last_iter = (None, None)

try:
    last_iter = (0, raw_data.next())
except StopIteration:
    # The file is empty
    pass
else:
    for new_row in raw_data:
        i, row = last_iter
        last_iter = (i + 1, new_row)

        if i == 10:
            # some operations
            output_data.writerow(row)
        elif i > 10:
            # some operations
            output_data.writerow(row)

    # Here, the last row of the file is in the `last_iter` variable.
    # It won't get written into the output file.
    output_data.writerow([])

Python scan file line by line and remove last line in the same loop

5 Answers5