Using awk, I am having difficulty trying to print the second and last three lines from multiple text files. In addition, I would like to direct the output to a text file.
Any help or suggestions would be appreciated.
Using awk, I am having difficulty trying to print the second and last three lines from multiple text files. In addition, I would like to direct the output to a text file.
Any help or suggestions would be appreciated.
This has the advantage that the whole file is not held in memory.
awk 'NR == 2 {print}; {line1 = line2; line2 = line3; line3 = $0} END {print line1; print line2; print line3}' files*
Edit:
The following uses some code from the gawk
manual that is portable to other versions of AWK. It provides per-file processing. Note that gawk
version 4 provides BEGINFILE
and ENDFILE
rules.
#!/usr/bin/awk -f
function beginfile (file) {
line1 = line2 = line3 = ""
}
function endfile (file) {
print line1; print line2; print line3
}
FILENAME != _oldfilename \
{
if (_oldfilename != "")
endfile(_oldfilename)
_oldfilename = FILENAME
beginfile(FILENAME)
}
END { endfile(FILENAME) }
FNR == 2 {
print
}
{
line1 = line2; line2 = line3; line3 = $0
}
Save that as a file, perhaps calling it "fileparts". Then do:
chmod u+x fileparts
Then you can do:
./fileparts file1 file2 anotherfile somemorefiles*.txt
and it will output the second line and the last three lines of each file in one set of output.
Or you can modify it to output to separate files or you can use a shell loop to output to separate files:
for file in file1 file2 anotherfile somemorefiles*.txt
do
./fileparts "$file" > "$file.out"
done
You can name the output files however you like. They will be text files.
To avoid reading the entire file into memory at once, use a deque with a maxlen of 3 to create a rolling buffer for capturing the last 3 lines:
from collections import deque
def get2ndAndLast3LinesFrom(filename):
with open(filename) as infile:
# advance past first line
next(infile)
# capture second line
second = next(infile)
# iterate over the rest of the file a line at a time, saving the final 3
last3 = deque(maxlen=3)
last3.extend(infile)
return second, list(last3)
You could generalize this approach to a function that would take any iterable:
def lastN(n, seq):
buf = deque(maxlen=n)
buf.extend(seq)
return list(buf)
Then you can create different length "last-n" functions using partial:
from functools import partial
last3 = partial(lastN, 3)
print last3(xrange(100000000)) # or just use range in Py3
If you aren't wedded to Python or AWK for the implementation, you can do something very straightforward using your shell and the standard head/tail utilities.
for file in "$@"; do
head -n2 "$file" | tail -n1
tail -n3 "$file"
done
You can also wrap this in a function or place it in a script, and then call it from within Python or AWK with subprocess.check_output() if you really want, but in such cases it may just be easier to use native methods rather than spawning an external process.
This would work, but it does load the entire file in memory, which might not be ideal if your files are very large.
text = filename.readlines()
print text[2] # print second line
for i in range(1,4): # print last three lines
print text[-i]
There are also some good alternatives discussed here.
i don't know about awk but if you are using Python i guess you will need something like this
inf = open('test1.txt','rU')
lines = inf.readlines()
outf = open('Spreadsheet.ods','w')
outf.write(str(lines[1]))
outf.write(str(lines[-3]))
outf.write(str(lines[-2]))
outf.write(str(lines[-1]))
outf.close()
inf.close()