0

I need to write some simple script for analysis of big number of log files, using combination of grep or awk to extract one (specified) line from each of the log and append its to some result.log with the name of the log file from which that line have been extracted. Each of the log files looks like:

Detected 8 CPUs
Reading input ... done.
Setting up the scoring function ... done.

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1         -6.8      0.000      0.000
   2         -6.4      8.197     10.006
   3         -5.9      1.227      2.791
   4         -5.6      1.551      3.947
   5         -5.2      1.061      3.325
   6         -5.1      1.055      4.219
   7         -4.4      2.000      3.318
   8         -3.9      1.110      3.362
   9         -3.8      1.460      4.123
  10         -2.4      6.960      9.282
  11         -2.2      1.277      4.038
  12         -1.9      1.758      4.043
  13          3.1      2.144      4.284
Writing output ... done.

I need to extract from this only first 5 lines consisted of

1         -6.8      0.000      0.000
2         -6.4      8.197     10.006
3         -5.9      1.227      2.791
4         -5.6      1.551      3.947
5         -5.2      1.061      3.325

and append it to the result.log which will seems like:

   From file name1.log
       1         -6.8      0.000      0.000
       2         -6.4      8.197     10.006
       3         -5.9      1.227      2.791
       4         -5.6      1.551      3.947
       5         -5.2      1.061      3.325

  From file name2.log
       1         -6.8      0.000      0.000
       2         -6.4      8.197     10.006
       3         -5.9      1.227      2.791
       4         -5.6      1.551      3.947
       5         -5.2      1.061      3.325

so for N log I should to have 5N such lines or N blocks consisted of 5 some ranking scores in the result.log

the idea of the script to loop all logs =

#!/bin/bash

for log in ./*.log2; do
  filename=$(basename "$log")
  filenamenoextention=${filename/.log/}
  #some command to extract of the line and put it to the final_results.txt
done

So I need only to know the combination of grep or sed (to find 5 lines from each log) and (mb) awk to extract selected (e.g only 1 and 2) columns

Thanks for help,

James

3 Answers3

1

If the lines of log file that you want to extract are always the same, you can do something like:

#!/bin/bash

for log in ./*.log2; do
  echo "From $log" >> result.log
  head -n 12 "$log"|tail -n 5 >> result.log
done
Etan Reisner
  • 77,877
  • 8
  • 106
  • 148
  • yes, @EtanReisner you're right. Fix it. So your comment will became something constructive. – Stefano Falsetto Sep 17 '14 at 14:13
  • thank you! yes it woks perfect for such simple case but could you show me some example with grep + awk for more complicated case –  Sep 17 '14 at 14:16
  • @JamesStarlight you should precise more what you mean with: _more complicated case_? If this is true: _Each of the log files looks like:_ this is an simple and nice answer. – clt60 Sep 17 '14 at 14:22
  • for instance when I need to extract some specified columns fro each lines should I pipe somethin to awk ? –  Sep 17 '14 at 14:25
  • 1
    @JamesStarlight if you want extract column - you should: 1.) edit the question 2.) ask a new question. We can't cover every possible _complications_ in one answer, for example: if you would transpose the matrix and write out it in random order with colored columns... – clt60 Sep 17 '14 at 14:27
  • @JamesStarlight yes, awk or maybe cut, with something like: `cut -d'' -f` where is a space `' '` or a tab `'\t'` or whatever you use. And is the column or a comma separed list of columns you need to extract. Be careful with spaces and cut: you need something like tr to squeeze or remove extra spaces. – Stefano Falsetto Sep 17 '14 at 14:28
0

From Ed Morton's fantastic answer here we get:

awk 'c&&c--;/^-----+/{print "From file "FILENAME; c=5}' name1.log name2.log ... > result.log

If you need extra leading indentation on the input lines then you can change that first pattern something like this:

c&&c--{printf "    ";print};
Community
  • 1
  • 1
Etan Reisner
  • 77,877
  • 8
  • 106
  • 148
0

If the count of the lines in your header could be not the same, you can use the next too:

grep -A5 -He '^----' *.log2 |
    sed -E 's/(.*)\.log2:-{5}.*/From file \1/;s/^[^-]+-//;/^--$/d' >result.log

prints:

From file c1
   1         -6.8      0.000      0.000
   2         -6.4      8.197     10.006
   3         -5.9      1.227      2.791
   4         -5.6      1.551      3.947
   5         -5.2      1.061      3.325
From file d
   1         -6.8      0.000      0.000
   2         -6.4      8.197     10.006
   3         -5.9      1.227      2.791
   4         -5.6      1.551      3.947
   5         -5.2      1.061      3.325
From file e
   1         -6.8      0.000      0.000
   2         -6.4      8.197     10.006
   3         -5.9      1.227      2.791
   4         -5.6      1.551      3.947
   5         -5.2      1.061      3.325

Your basic command could be:

grep -A5 -He '^----' *.log2

what prints the needed informations in the form:

c1.log2:-----+------------+----------+----------
c1.log2-   1         -6.8      0.000      0.000
c1.log2-   2         -6.4      8.197     10.006
c1.log2-   3         -5.9      1.227      2.791
c1.log2-   4         -5.6      1.551      3.947
c1.log2-   5         -5.2      1.061      3.325
--
d.log2:-----+------------+----------+----------
d.log2-   1         -6.8      0.000      0.000
d.log2-   2         -6.4      8.197     10.006
d.log2-   3         -5.9      1.227      2.791
d.log2-   4         -5.6      1.551      3.947
d.log2-   5         -5.2      1.061      3.325
--
e.log2:-----+------------+----------+----------
e.log2-   1         -6.8      0.000      0.000
e.log2-   2         -6.4      8.197     10.006
e.log2-   3         -5.9      1.227      2.791
e.log2-   4         -5.6      1.551      3.947
e.log2-   5         -5.2      1.061      3.325

E.g., where

  • each line is prefixed with the filename from where is coming, for easy manipulation,
  • each block of 5 lines are delimited with --
  • each filename is delimited with the -----+------------+----------+----------

From this format you can do anything with piping it to awk, perl, sed and so on...

clt60
  • 62,119
  • 17
  • 107
  • 194