1

I am trying to scan a csv file and make adjustments line by line. In the end, I would like to remove the last line. How can I remove the last line within the same scanning loop?

My code below reads from the original file, makes adjustments and finally writes to a new file.

import csv

raw_data = csv.reader(open("original_data.csv", "r"), delimiter=",")
output_data = csv.writer(open("final_data.csv", "w"), delimiter=",")
lastline = # integer index of last line

for i, row in enumerate(raw_data):
    if i == 10:
        # some operations
        output_data.writerow(row)
    elif i > 10 and i < lastline:
        # some operations
        output_data.writerow(row)
    elif i == lastline:
        output_data.writerow([])
    else:
        continue
Boxuan
  • 4,937
  • 6
  • 37
  • 73
  • Remove the last line from which file? The original input file? –  Jan 26 '15 at 16:25
  • @Evert remove last line in output_data. – Boxuan Jan 26 '15 at 16:26
  • If you want to remove that last line, why write it in the first place? –  Jan 26 '15 at 16:27
  • @Evert I start with a file called `original_data.csv`. I do not know how many lines are there. `final_data.csv` is the output file I created with all adjustments made. After all adjustments, I do not want to keep the last line from the original file. However, how do I know if a line is the last line? – Boxuan Jan 26 '15 at 16:31
  • You might be able to simply slice `raw_data`: `for i, row in enumerate(raw_data[:-1]):`. –  Jan 26 '15 at 16:33
  • @Evert `csv.reader` returns an iterator. – Ashwini Chaudhary Jan 26 '15 at 16:34
  • @AshwiniChaudhary I figured that now, seeing the answers. It was just a blind guess, but an iterator is more logical for the IO here. –  Jan 26 '15 at 17:11

5 Answers5

4

You can make a generator to yield all elements except the last one:

def remove_last_element(iterable):
    iterator = iter(iterable)
    try:
        prev = next(iterator)
        while True:
            cur = next(iterator)
            yield prev
            prev = cur
    except StopIteration:
        return

Then you just wrap raw_data in it:

for i, row in enumerate(remove_last_element(raw_data)):
    # your code

The last line will be ignored automatically.

This approach has the benefit of only reading the file once.

Kolmar
  • 14,086
  • 1
  • 22
  • 25
2

A variation of @Kolmar's idea:

def all_but_last(it):
    buf = next(it)
    for item in it:
        yield buf
        buf = item

for line in all_but_last(...):

Here's more generic code that extends islice (two-args version) for negative indexes:

import itertools, collections

def islice2(it, stop):
    if stop >= 0:
        for x in itertools.islice(it, stop):
            yield x
    else:
        d = collections.deque(itertools.islice(it, -stop))
        for item in it:
            yield d.popleft()
            d.append(item)


for x in islice2(xrange(20), -5):
    print x,

# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Community
  • 1
  • 1
georg
  • 211,518
  • 52
  • 313
  • 390
  • That's a good modification, only mind that `itertools.islice` also has optional `start` and `step` parameters – Kolmar Jan 26 '15 at 17:03
  • @Kolmar: yeah, but I'm too lazy ;) `islice2(xs, 100, 0, -5)` sounds like a lot of work. – georg Jan 26 '15 at 17:12
1

You can iterate with window of size 2 and print only the first value in the window. This will lead to the last element being skipped:

from itertools import izip, tee

def pairwise(iterable):
    a, b = itertools.tee(iterable)
    next(b, None)
    return izip(a, b)

for row, _ in pairwise(raw_data):
    output_data.writerow(row)

output_data.writerow([])
ovgolovin
  • 13,063
  • 6
  • 47
  • 78
  • Worth noting that this method uses the fact that list multiplication copies references, thus the result is a pair of references to the same iterator, which gets incremented twice as fast with each loop. – Maciej Gol Jan 26 '15 at 16:39
  • @kroolik I have edited the answer. It should be `pairwice`, not grouper. Thanks! – ovgolovin Jan 26 '15 at 16:43
0

An idea is to calculate the length of each line you iterate and then when coming to the last line truncate the file thus "shortening the file". Not sure if this is good practice though...

eg Python: truncate a file to 100 lines or less

Community
  • 1
  • 1
user1267259
  • 761
  • 2
  • 10
  • 22
0

Instead of writing the current line each loop iteration, try writing the previously read line:

import csv

raw_data = csv.reader(open("original_data.csv", "r"), delimiter=",")
output_data = csv.writer(open("final_data.csv", "w"), delimiter=",")
last_iter = (None, None)

try:
    last_iter = (0, raw_data.next())
except StopIteration:
    # The file is empty
    pass
else:
    for new_row in raw_data:
        i, row = last_iter
        last_iter = (i + 1, new_row)

        if i == 10:
            # some operations
            output_data.writerow(row)
        elif i > 10:
            # some operations
            output_data.writerow(row)

    # Here, the last row of the file is in the `last_iter` variable.
    # It won't get written into the output file.
    output_data.writerow([])
Maciej Gol
  • 15,394
  • 4
  • 33
  • 51