3

I'm trying to extract some text using awk.

Here is the sample file:

...
var a=2;
var x=[
        0, 1;
        1,0;
        2,1;
        3,2];
other text
//this is a comment with brackets []

So, when I execute the following command :

awk '/var/ , /;/' file

I obtain :

var a=2;
var x=[
        0, 1;

Result expected :

var a=2;
var x=[
        0, 1;
        1,0;
        2,1;
        3,2];

Logically, the previous command took the first ; and print the result.

The process should ignore ; if that one is matching with the following regex: ^[\t\ ]{1,}[0-9,]{1,}.*;$

Do you have any idea about it?

wolfgunner
  • 119
  • 2
  • 13
  • 1
    How about `awk '/var/ , /];/' file` ? – anubhava Feb 28 '23 at 15:22
  • A generalization and enhancement of the concept could be seen here: https://stackoverflow.com/a/31112076/42580 – UlfR Feb 28 '23 at 15:35
  • @wolfgunner : side note : for `ERE`, you can trim :::::::: ::::::::::::: ::::::::::::::::: :::::::::::::::::::::: ::::::::::::::::::::::::: `(\[|\(|\{).*[;].*(\]|\)|\{)` to simply `[[({].*;.*[]){]` :::::::::::::::::::: ::::::: p.s. you actually want `{` on right hand side match instead of `}` ? – RARE Kpop Manifesto Feb 28 '23 at 17:34

5 Answers5

6

You can just match ]; or ] *; like this:

awk '/var/ , /] *;/' file

var x=[
        0, 1;
        1,0;
        2,1;
        3,2];
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • We could have another line (i.e comment) like : `//this is an array []`, so the output prints also the latest line !! – wolfgunner Mar 03 '23 at 07:35
  • We are matching `]` followed by 0 or more whitespaces then a `;` but your comment line doesn't have `;` – anubhava Mar 03 '23 at 10:24
  • I suggest you edit your question and show updated input with expected output so that may be I can understand it better. – anubhava Mar 03 '23 at 10:36
4

With your shown samples please try following GNU awk code. Written and tested with shown samples Only.

awk -v RS='(^|\n)var x=\\[[^]]*\\]' 'RT && sub(/^\n/,"",RT){print RT}' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
2

Another.

awk 'BEGIN {
    RS="^$"                                         # read in the whole file
}
{
    while(match($0,/var([^]]|\]+[^];])*\]*\];/)) {  # look for var...];
        print substr($0,RSTART,RLENGTH)             # output match
        $0=substr($0,RSTART+RLENGTH)                # continue
    }
}' file

A bit extended file:

some text
var x=[
        0, 1;
        1,0;
        2,1;
        3,2];
other text
var x=[
        0, 1;
        1,0;
        2,1;
        3,2];
more text

Output:

var x=[
        0, 1;
        1,0;
        2,1;
        3,2];
var x=[
        0, 1;
        1,0;
        2,1;
        3,2];
James Brown
  • 36,089
  • 7
  • 43
  • 59
1
mawk '/=[[]/,/[]];$/'
var x=[
        0, 1;
        1,0;
        2,1;
        3,2];

or shrink it down to this but it's less clear :

gawk '/=\[/,/];$/'
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11
1

Assumptions:

  • no nested ]
  • each var is guaranteed to have a pair of matching [ and ]

One awk idea:

$ awk '/var/,/]/' file
var x=[
        0, 1;
        1,0;
        2,1;
        3,2];
markp-fuso
  • 28,790
  • 4
  • 16
  • 36