0

I have a file which I'd like to process with bash. Can be with awk, sed or grep or similar. The file has multiple occurrences on a single line. I would like to extract everything between these two occurrences and print the output each on a separate line.

I have already tried using this:

cat file.txt | grep -o 'pattern1.*pattern2'

But this will print everything matching from pattern1 to the very last matching pattern2.

$ cat file.txt
pattern1 this is the first content pattern2 this is some other stuff pattern1 this is the second content pattern2 this is the end of the file.

I'd like to get:

pattern1 this is the first content pattern2
pattern1 this is the second content pattern2
kvantour
  • 25,269
  • 4
  • 47
  • 72
RattleZ
  • 13
  • 1
  • 1
    Can you have `foo pattern1 bar pattern2` or `pattern1 foo pattern1 bar pattern2` or `pattern1 foo pattern2 bar pattern2` in your input? If so include those cases in your question and show the expected output for each. – Ed Morton Apr 17 '19 at 14:34
  • 1
    I reopened this because the other question that this was previously closed as a dup of (https://stackoverflow.com/questions/3027518/how-to-do-a-non-greedy-match-in-grep ) is asking about matching across multiple lines which is a much easier problem to solve than within lines and it doesn't contain a solution for standard UNIX tools, just for perl or GNU grep with its experimental -P option, and there are better (simpler, more efficient, more portable, more robust) solutions for matching across lines. – Ed Morton Apr 17 '19 at 15:20

3 Answers3

0

try gnu sed:

 sed -E 's/(pattern2).*(pattern1)(.*\1).*/\1\n\2\3/' file.txt
  • That'd fail for some cases of `extract everything between these two occurrences` that aren't included in the OPs sample input so [I've asked](https://stackoverflow.com/questions/55728627/how-can-i-print-multiple-patterns-on-separate-lines#comment98138929_55728627) if those can occur. – Ed Morton Apr 17 '19 at 14:35
  • You example print just the first string between the both patterns but skips the second string. And yes, I have multiple special characters between those patterns, also for better understanding, sometimes I don't know if the string between pattern1 and pattern2 occurs once, twice, three times, or x times. I'll get a piece of actual example code later to show what I mean. – RattleZ Apr 17 '19 at 21:22
  • No, don't just grab a random piece of code and throw it up for us to try to wade through. Take the time to thoughtfully create a [mcve] that demonstrates all aspects of your problem and which we can test a potential solution against. – Ed Morton Apr 17 '19 at 23:48
0

In case you don't have access to tools that support lookarounds, this approach though lengthy will work robustly using standard tools on any UNIX box:

awk '{
    gsub(/@/,"@A"); gsub(/{/,"@B"); gsub(/}/,"@C"); gsub(/pattern1/,"{"); gsub(/pattern2/,"}")
    out = ""
    while( match($0,/{[^{}]*}/) ) {
        out = (out=="" ? "" : out ORS) substr($0,RSTART,RLENGTH)
        $0 = substr($0,RSTART+RLENGTH)
    }
    $0 = out
    gsub(/}/,"pattern2"); gsub(/{/,"pattern1"); gsub(/}/,"@C"); gsub(/{/,"@B"); gsub(/@A/,"@")
} 1' file

The above works by creating characters that can't exist in the input (by first changing those characters { and } to some other strings @B and @C) so it can use those chars in a negated character class to find the target strings and then it returns all the changed chars to their original values. Here it is with some prints to make it more obvious what's happening at each step:

awk '{
    print "1): " $0 ORS
    gsub(/@/,"@A"); gsub(/{/,"@B"); gsub(/}/,"@C"); gsub(/pattern1/,"{"); gsub(/pattern2/,"}")
    print "2): " $0 ORS
    out = ""
    while( match($0,/{[^{}]*}/) ) {
        out = (out=="" ? "" : out ORS) substr($0,RSTART,RLENGTH)
        $0 = substr($0,RSTART+RLENGTH)
    }
    $0 = out
    print "3): " $0 ORS
    gsub(/}/,"pattern2"); gsub(/{/,"pattern1"); gsub(/}/,"@C"); gsub(/{/,"@B"); gsub(/@A/,"@")
    print "4): " $0 ORS
} 1' file
1): pattern1 this is the first content pattern2 this is some other stuff pattern1 this is the second content pattern2 this is the end of the file.

2): { this is the first content } this is some other stuff { this is the second content } this is the end of the file.

3): { this is the first content }
{ this is the second content }

4): pattern1 this is the first content pattern2
pattern1 this is the second content pattern2

pattern1 this is the first content pattern2
pattern1 this is the second content pattern2
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

This might work for you (GNU sed):

sed -n '/pattern1.*pattern2/{s/pattern1/\n&/;s/.*\n//;s/pattern2/&\n/;P;D}' file

Set the option -n to print explicitly.

Only process lines that contain pattern1 followed by pattern2.

Prepend a newline to pattern1.

Remove upto and including the introduced newline.

Append a newline following pattern2.

Print the first line in the pattern space, delete it and repeat.

potong
  • 55,640
  • 6
  • 51
  • 83