0

I need to extract the first 2 rows and last row from a lot of .txt and .csv files. How can I allow a user to choose a file and it output a new .txt or .csv file with just those 3 rows in?

mbf94
  • 5
  • 1
  • 8

4 Answers4

2

This is what you need:

def extract_lines(filename,outputname):
    l = []
    with open(filename,'r') as f: 
        for index,line in enumerate(f): #This iterates the file line by line which is memory efficient in case the csv is huge.
            if index < 2: #first 2 lines
                l.append(line)
        if index > 1: # means the file has at least 3 lines
            l.append(line)
    with open(outputname,'w') as f:
        for line in l:
            f.write(line)
Alex Fung
  • 1,996
  • 13
  • 21
  • @Adirio Nope. It is meant to be outside of the loop. the If statement checks the no of lines in the file is at least 3 lines. if the file has only 2 lines, then there is no point to add the "last" line as the first 2 lines include the "last". – Alex Fung Feb 16 '17 at 08:46
  • True, my bad. I would actually improve the efficiency by reading the 2 first lines out of the loop and looping with an instant discard. This way the `if` would not need to be evaluated for every line, which can take some time for big files. – Adirio Feb 16 '17 at 08:55
  • @Adirio, True. If the question mentions **quickest way**, I would probably use `seek` instead. – Alex Fung Feb 16 '17 at 09:29
1
def get_lines(filename, front=2, rear=1):
    result = []
    with open(filename, 'rb') as f:
        for i, val in enumerate(f):
            if i >= front:
                break
            result.append(val)

        back_pos = -2
        f.seek(back_pos, 2)  # jump to the second end byte

        rear_count = 0
        while True:
            if '\n' in f.read(1):
                rear_count += 1

            if rear_count >= rear:
                result.extend(f.readlines())
                break

            back_pos -= 1
            f.seek(back_pos, 2)

    return result

It's easy to read first row, but hard to read last row. To iter rows is very slowly.

duke yu
  • 89
  • 4
0

I think you can also use the bash script to achieve this requirement.

#!/bin/bash

for file in $(find . -name '*.txt' -o -name '*.csv' )
do
    sed -n -e '1,2p' -e '$p' ${file} > "result"${file:(-5)}
done

This script will search for all files ending in txt or csv. And it will cut the first two lines and the last line, store these lines in a new file.

For example, I have three files named file1.txt, file2.txt, file3.csv, it will cut the three lines for each file, and store them in result1.txt, result2.txt, result3.csv respectively.

bwangel
  • 732
  • 1
  • 7
  • 12
0

In that way you can return the lines that you want is only question to play with the range

df=open(r"D:\...\nameFile.txt",encoding='utf8')

def etiqueta(df):
    lista=[]
    for line,x in zip(df,range(0,2)):
        lista.append(line)
    return lista

etiqueta(df)