I need to extract the first 2 rows and last row from a lot of .txt and .csv files. How can I allow a user to choose a file and it output a new .txt or .csv file with just those 3 rows in?
Asked
Active
Viewed 1,863 times
4 Answers
2
This is what you need:
def extract_lines(filename,outputname):
l = []
with open(filename,'r') as f:
for index,line in enumerate(f): #This iterates the file line by line which is memory efficient in case the csv is huge.
if index < 2: #first 2 lines
l.append(line)
if index > 1: # means the file has at least 3 lines
l.append(line)
with open(outputname,'w') as f:
for line in l:
f.write(line)

Alex Fung
- 1,996
- 13
- 21
-
@Adirio Nope. It is meant to be outside of the loop. the If statement checks the no of lines in the file is at least 3 lines. if the file has only 2 lines, then there is no point to add the "last" line as the first 2 lines include the "last". – Alex Fung Feb 16 '17 at 08:46
-
True, my bad. I would actually improve the efficiency by reading the 2 first lines out of the loop and looping with an instant discard. This way the `if` would not need to be evaluated for every line, which can take some time for big files. – Adirio Feb 16 '17 at 08:55
-
@Adirio, True. If the question mentions **quickest way**, I would probably use `seek` instead. – Alex Fung Feb 16 '17 at 09:29
1
def get_lines(filename, front=2, rear=1):
result = []
with open(filename, 'rb') as f:
for i, val in enumerate(f):
if i >= front:
break
result.append(val)
back_pos = -2
f.seek(back_pos, 2) # jump to the second end byte
rear_count = 0
while True:
if '\n' in f.read(1):
rear_count += 1
if rear_count >= rear:
result.extend(f.readlines())
break
back_pos -= 1
f.seek(back_pos, 2)
return result
It's easy to read first row, but hard to read last row. To iter rows is very slowly.

duke yu
- 89
- 4
0
I think you can also use the bash script to achieve this requirement.
#!/bin/bash
for file in $(find . -name '*.txt' -o -name '*.csv' )
do
sed -n -e '1,2p' -e '$p' ${file} > "result"${file:(-5)}
done
This script will search for all files ending in txt or csv. And it will cut the first two lines and the last line, store these lines in a new file.
For example, I have three files named file1.txt, file2.txt, file3.csv, it will cut the three lines for each file, and store them in result1.txt, result2.txt, result3.csv respectively.

bwangel
- 732
- 1
- 7
- 12
0
In that way you can return the lines that you want is only question to play with the range
df=open(r"D:\...\nameFile.txt",encoding='utf8')
def etiqueta(df):
lista=[]
for line,x in zip(df,range(0,2)):
lista.append(line)
return lista
etiqueta(df)