1

I need to extract "tables" from big log files. A table is delimited by a line that contains:

IMPRESSION DE LA TABLE TSTR

and by the 4th occurrence of this line:

--------- ---------------------------------------------------------------

I cannot find a more precise definition of my table.

As an example, my log file contains:

SOME
TEXT
BEFORE
IMPRESSION DE LA TABLE TSTR                   1323
--------- ---------------------------------------------------------------
POSITION |          INDICE                   |          ARGUMENT
--------- ---------------------------------------------------------------
         | TYPE     |  VALEUR                | TYPE     |  VALEUR
--------- ---------------------------------------------------------------
    1    | CHAINE   | COMPTEUR1              | ENTIER   | 0
    2    | CHAINE   | COMPTEUR2              | ENTIER   | 0
    3    | CHAINE   | RHO_1                  | TABLE    | 5187
    4    | CHAINE   | RHO_2                  | TABLE    | 6029
--------- ---------------------------------------------------------------
ANY
OTHER
TEXT
FOLLOWS

And I would like to get:

IMPRESSION DE LA TABLE TSTR                   1323
--------- ---------------------------------------------------------------
POSITION |          INDICE                   |          ARGUMENT
--------- ---------------------------------------------------------------
         | TYPE     |  VALEUR                | TYPE     |  VALEUR
--------- ---------------------------------------------------------------
    1    | CHAINE   | COMPTEUR1              | ENTIER   | 0
    2    | CHAINE   | COMPTEUR2              | ENTIER   | 0
    3    | CHAINE   | RHO_1                  | TABLE    | 5187
    4    | CHAINE   | RHO_2                  | TABLE    | 6029
--------- ---------------------------------------------------------------

I found ideas at:

How to select lines between two patterns?

Extract lines between two patterns from a file

But none of them allows for finding the n-th occurrence of a pattern.

Note also that I have other tables in my file (with other names), that I don't want to extract.

Simpom
  • 938
  • 1
  • 6
  • 23

3 Answers3

3

Short awk approach:

awk '/^IMPRESSION DE LA TABLE TSTR/{f=1}f && /^-/ && ++c==4{print; f=c=0}f' file

The output:

IMPRESSION DE LA TABLE TSTR                   1323
--------- ---------------------------------------------------------------
POSITION |          INDICE                   |          ARGUMENT
--------- ---------------------------------------------------------------
         | TYPE     |  VALEUR                | TYPE     |  VALEUR
--------- ---------------------------------------------------------------
    1    | CHAINE   | COMPTEUR1              | ENTIER   | 0
    2    | CHAINE   | COMPTEUR2              | ENTIER   | 0
    3    | CHAINE   | RHO_1                  | TABLE    | 5187
    4    | CHAINE   | RHO_2                  | TABLE    | 6029
--------- ---------------------------------------------------------------
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
2

Based on your Input this one should work:

awk '/^IMPRESSION DE LA TABLE TSTR/{ p = 1 };/^\-/{ c++; if ( c == 4  ){ print $0; c = 0; p = 0} }p'

Output:

IMPRESSION DE LA TABLE TSTR                   1323
--------- ---------------------------------------------------------------
POSITION |          INDICE                   |          ARGUMENT
--------- ---------------------------------------------------------------
         | TYPE     |  VALEUR                | TYPE     |  VALEUR
--------- ---------------------------------------------------------------
    1    | CHAINE   | COMPTEUR1              | ENTIER   | 0
    2    | CHAINE   | COMPTEUR2              | ENTIER   | 0
    3    | CHAINE   | RHO_1                  | TABLE    | 5187
    4    | CHAINE   | RHO_2                  | TABLE    | 6029
--------- ---------------------------------------------------------------

Explanation:

This one starts printing when is sees a line that starts with IMPRESSION DE LA TABLE TSTR. Then it counts the lines beginning with an - until the fourth occurrence. When the fourth one is reached the line is printed and the counter is set to zero again. Also p is set to 0 and so printing is disabled until the next line that begins with IMPRESSION DE LA TABLE TSTR.

JFS31
  • 518
  • 5
  • 13
  • Works great in case I have no other table before. But the pattern "------" is found multiple times before the desired table is reach, so it seems that the counter is useless then. – Simpom Sep 01 '17 at 09:42
  • In the beginning I forgot to set c to 0. But now it should work with multiple ones. I tried it by copying yours four times into a file. – JFS31 Sep 01 '17 at 09:46
  • That would print every 4th line that starts with `-` between the target tables. – Ed Morton Sep 01 '17 at 11:51
0

Perl to the rescue:

perl -ne '
    ($table, $line) = (1, 0) if /IMPRESSION DE LA TABLE TSTR/;
    ++$line if /^-{9} -{63}$/;
    print if $table;
    ($table, $line) = () if 4 == $line;
' -- file
choroba
  • 231,213
  • 25
  • 204
  • 289