How can I get words between the first two instance of text/pattern?

Question

Input:

===================================
v2.0.0

Added feature 3
Added feature 4
===================================
v1.0.0

Added feature 1
Added feature 2
===================================

Output that I want:

v2.0.0

Added feature 3
Added feature 4

I tried this but it gets the first equals (=) and the LAST equals (=) while I want to get is the FIRST TWO equals (=)

score 3 · Accepted Answer · answered May 15 '19 at 10:54

Here is one in awk:

$ awk '/^=+$/{f=!f;if(f==1)next;else if(f==0)exit}f' file
v2.0.0

Added feature 3
Added feature 4

In pretty print:

$ awk '/^=+$/ {     # at ===...
    f=!f            # flag state is flipped
    if(f==1)        # if its one (first ===...)
        next        # next record
    else if(f==0)   # if zero (second ===...)
        exit        # nothing more to do yeah
}
f' file             # print

oguz ismail · Answer 2 · 2019-05-15T16:50:41.980

Here is another one in GNU sed:

$ sed -n '/^=\+$/,//{//!p;b};q' file
v2.0.0

Added feature 3
Added feature 4

/^=\+$/,// is a shorthand for /^=\+$/,/^=\+$/, it selects the lines between two lines consisting of equal signs inclusively, and the commands between following curly braces are executed for these lines,
//!p is a shorthand for /^=\+$/!p, it means if incoming line is not one of those which consist of only =s, print it,
b means go to the end of cycle (i.e pass q),
q is for exitting sed after printing selected lines.

The following version will work with all POSIX-compliant seds but it looks 2x more cryptic:

sed -n -e '/^=\{1,\}$/,//{//!p;b' -e '}' -e 'q' file

Note that these are not gonna work if there are two consequent all = lines in the input.

Looks even more cryptic. ++ – James Brown May 15 '19 at 10:59 — James Brown, May 15 '19 at 10:59

RavinderSingh13 · Answer 3 · 2019-05-15T11:43:08.757

3

Could you please try following too.

awk '/^=/{count++;next} count>=2{exit} {print}'  Input_file

edited May 15 '19 at 11:43

answered May 15 '19 at 11:18

RavinderSingh13

130,504
14
57
93

1

`count>=2` would be better in case there is a third `====` right next to the second. – James Brown May 15 '19 at 11:37
1

@JamesBrown, sure done that now, thank you sir foe letting know. – RavinderSingh13 May 15 '19 at 11:43

Ed Morton · Answer 4 · 2019-05-15T16:36:14.147

3

With GNU awk for multi-char RS:

$ awk -v RS='(^|\n)=+\n' 'NR==2' file
v2.0.0

Added feature 3
Added feature 4

With any other awk the equivalent would be lengthier:

$ awk '
    /^=+$/ { prt(); next }
    { rec=rec $0 ORS }
    END { prt() }
    function prt() { if (++nr==2) printf "%s", rec; rec="" }
' file
v2.0.0

Added feature 3
Added feature 4

Note that the above will work to print any number of record, not just the 2nd one, just by changing 2 to whatever record number you want printed and you can trivially add/change conditions like only printing the record if it contains some string instead of or in addition to based on the record number, e.g. to print the 17th record if it contains foo:

awk -v RS='(^|\n)=+\n' 'NR==17 && /foo/' file

Explanation: Your records are separated by === lines so set the Record Separator RS to a regexp that matches that description, then just print the record when the Number of Records (NR) reaches the number you want, i.e. 2 (because there's a null record before the first === line).

edited May 15 '19 at 16:36

answered May 15 '19 at 13:40

Ed Morton

188,023
17
78
185

Your first one is as clever, concise, and as confusing, as the sed solution! ;-) – Alex Harvey May 15 '19 at 15:33
We will have to agree to disagree on that I suspect. Your records are separated by `===` lines so set `RS` to a regexp that matches that description, then just print the record number you want - seems extremely clear, simple and obvious to me. It also has the advantage over the sed solution that it doesn't require a complete rewrite to print the 17th rather than the 2nd record or to test for other contents of the record or do anything else. – Ed Morton May 15 '19 at 15:35
No, no, I did understand it. Took me a few moments. Clear, I guess, is subjective. But I feel fairly certain that people who don't know AWK like you do will have no clue how it works. – Alex Harvey May 15 '19 at 15:39
Agreed. When you give a tool constructs to make job X (e.g. manipulating text in awks case) easier, people have to read the man page to understand what those constructs mean. If you don't know that awk has an implicit while-read loop and implicit if-condition-action blocks with a default action of print, and that it treats all input as records separated by RS and the record number is stored in NR then you wouldn't have a chance of figuring out what that script does. On the other hand once you DO know those and 4 or 5 other fundamentals (fields/vars) every other text manipulation task is simple – Ed Morton May 15 '19 at 15:45
I already explained what `RS` is. A regexp is just a regexp, there's nothing unusual about `'(^|\n)=+\n'` to match a line of `=`s so I'm not going to explain regexp syntax here. `-v` and how awk works in general are well documented in the awk man page so I'm not going to explain those here either. Awk is the right tool for the job and to learn the basics of how awk works there are tons of online tutorials and documentation but the best place to start is the book Effective Awk Programming, 4th Edition, by Arnold Robbins. – Ed Morton May 15 '19 at 16:50
From `man awk`: `-v var=val ... Assign the value val to the variable var, before execution of the program begins. ` – Ed Morton May 15 '19 at 16:52
From `man awk`: `RS ... The input record separator, by default a newline.` – Ed Morton May 15 '19 at 16:53
See https://regex101.com/r/jNnDaM/1 for the explanation of the regexp `'(^|\n)=+\n'` – Ed Morton May 15 '19 at 16:55
@user3439894 `RS` **IS** a variable, the input Record Separator, and, just like it'd set any other variable, `-v RS=..` just sets it to a specific value other than it's default of a newline (`\n` on UNIX and `\r\n` on Windows). Please do feel free to ask if you have any specific questions about any of that. I understand sometimes there's just one hurdle to get over to get to that shining moment of clarity! – Ed Morton May 15 '19 at 17:04
1

I'm not a `awk` noob but certainly not at all proficient with it, and do understand RegEx better then `awk`. I think in my mind I was confusing field separator not record separator and because I tend to use `-F` were `-v` is not required, over `FS` or `RS` where prefacing either with `-v` is required. Anyway, just a big brain fart! Thanks for your answer even though it wasn't my OP, I still learned something! :) – user3439894 May 15 '19 at 17:24

Alex Harvey · Answer 5 · 2019-05-15T23:47:02.023

Arguably, a cleaner GNU sed solution:

sed -E '0,/^={35}$/d; //Q'

Or, if you are happy with the simpler regex proposed in other answers:

sed -E '0,/^=+$/d; //Q'

Further explanation:

The (extended, note -E) regex /^={35}$/ matches a line, like yours, that consists of exactly 35 equals signs. (The alternative regex /^=+$/ matches a line that is one or more equals signs.)
The command 0,/^={35}$/d selects all lines from the beginning of the file to the first occurrence of the pattern, and deletes.
The expression // causes a sed regex to default to the last regex that was used as part of an address or s/// command.
The Q command is a GNU extension, that causes sed to exit without printing.

Testing this:

# test.sh

cat > FILE <<EOF
other
text
===================================
v2.0.0

Added feature 3
Added feature 4
===================================
v1.0.0

Added feature 1
Added feature 2
===================================
EOF

gsed -E '0,/^={35}$/d; //Q' FILE

Output:

▶ bash test.sh 
v2.0.0

Added feature 3
Added feature 4

What's the significance of `▶ bash test.sh`, as I do not see any preface for its use. — user3439894, May 15 '19 at 16:22
Sorry if that's unclear. I just named my test script test.sh and then ran it using `bash test.sh`. The `▶` is just my prompt. @user3439894 — Alex Harvey, May 15 '19 at 23:48

score 1 · Answer 6 · answered May 15 '19 at 18:20

This might work for you (GNU sed):

sed -n '/^=\+$/{:a;n;//q;p;ba}' file

Use explicit printing by setting the option -n, this means lines will only be printed by a p or P command.

On encountering a line containing all = characters, fetch the next line and if this contains the same regexp then quit the file. Otherwise print the current line and repeat.

How can I get words between the first two instance of text/pattern?

6 Answers6