How can I print second and last three lines from multiple text files, in AWK or Python?

Question

Using awk, I am having difficulty trying to print the second and last three lines from multiple text files. In addition, I would like to direct the output to a text file.

Any help or suggestions would be appreciated.

Clarify what you mean by "table." SQL table? A specific language format? Spreadsheet? Pretty-printed text? — Greg E., Jun 16 '12 at 12:43
This question's weird in that it's asking for, and receiving, answers in both AWK and Python. — smci, Apr 24 '18 at 22:51

Dennis Williamson · Accepted Answer · 2012-06-17T04:18:04.097

This has the advantage that the whole file is not held in memory.

awk 'NR == 2 {print}; {line1 = line2; line2 = line3; line3 = $0} END {print line1; print line2; print line3}' files*

Edit:

The following uses some code from the gawk manual that is portable to other versions of AWK. It provides per-file processing. Note that gawk version 4 provides BEGINFILE and ENDFILE rules.

#!/usr/bin/awk -f
function beginfile (file) {
    line1 = line2 = line3 = ""
}

function endfile (file) {
    print line1; print line2; print line3
}

FILENAME != _oldfilename \
     {
         if (_oldfilename != "")
             endfile(_oldfilename)
         _oldfilename = FILENAME
         beginfile(FILENAME)
     }

     END   { endfile(FILENAME) }

FNR == 2 {
    print
}

{
    line1 = line2; line2 = line3; line3 = $0
}

Save that as a file, perhaps calling it "fileparts". Then do:

chmod u+x fileparts

Then you can do:

./fileparts file1 file2 anotherfile somemorefiles*.txt

and it will output the second line and the last three lines of each file in one set of output.

Or you can modify it to output to separate files or you can use a shell loop to output to separate files:

for file in file1 file2 anotherfile somemorefiles*.txt
do
    ./fileparts "$file" > "$file.out"
done

You can name the output files however you like. They will be text files.

Thanks for your answer. I am a beginner in awk.How can I change your code for multiple files ? I also need to get the output to a spread sheet or to a text file. — sagar, Jun 17 '12 at 02:33

score 1 · Answer 2 · answered Jun 16 '12 at 16:35

To avoid reading the entire file into memory at once, use a deque with a maxlen of 3 to create a rolling buffer for capturing the last 3 lines:

from collections import deque
def get2ndAndLast3LinesFrom(filename):
    with open(filename) as infile:
        # advance past first line
        next(infile)
        # capture second line
        second = next(infile)
        # iterate over the rest of the file a line at a time, saving the final 3
        last3 = deque(maxlen=3)
        last3.extend(infile)        
        return second, list(last3)

You could generalize this approach to a function that would take any iterable:

def lastN(n, seq):
    buf = deque(maxlen=n)
    buf.extend(seq)
    return list(buf)

Then you can create different length "last-n" functions using partial:

from functools import partial
last3 = partial(lastN, 3)

print last3(xrange(100000000)) # or just use range in Py3

score 1 · Answer 3 · answered Jun 16 '12 at 19:59

If you aren't wedded to Python or AWK for the implementation, you can do something very straightforward using your shell and the standard head/tail utilities.

for file in "$@"; do
    head -n2 "$file" | tail -n1
    tail -n3 "$file"
done

You can also wrap this in a function or place it in a script, and then call it from within Python or AWK with subprocess.check_output() if you really want, but in such cases it may just be easier to use native methods rather than spawning an external process.

score 0 · Answer 4 · edited May 23 '17 at 12:13

0

This would work, but it does load the entire file in memory, which might not be ideal if your files are very large.

text = filename.readlines()

print text[2] # print second line

for i in range(1,4): # print last three lines
    print text[-i]

There are also some good alternatives discussed here.

edited May 23 '17 at 12:13

Community

1
1

answered Jun 16 '12 at 12:14

Junuxx

14,011
5
41
71

score 0 · Answer 5 · answered Jun 16 '12 at 14:28

i don't know about awk but if you are using Python i guess you will need something like this

inf = open('test1.txt','rU')
lines = inf.readlines()
outf = open('Spreadsheet.ods','w')
outf.write(str(lines[1]))
outf.write(str(lines[-3]))
outf.write(str(lines[-2]))
outf.write(str(lines[-1]))
outf.close()
inf.close()

How can I print second and last three lines from multiple text files, in AWK or Python?

5 Answers5

Linked