How to find the multiline pattern match (they must be first time match)?

Question

I know this question How to find patterns across multiple lines using grep? But I think my problem is more complicated. So I need help.

I have a dictionary file BCFile as

boundary
{
    inlet
    {
        type            fixedValue;
        value           uniform (5 0 0);
    }

    outlet
    {
        type            inletOutlet;
        inletValue      $internalField;
        value           $internalField;
    }

    ....
}

I am writing a script so to print out the inlet boundary condition fixedValue, and the outlet boundary condition inletOutlet.

If I use cat BCFile | grep "type" | awk '{printf $2}' | tr -d ";", it won't work as keyword type occurs many times.

If I use awk -v RS='}' '/inlet/ { print $4 }' BCFile, it won't work either, because keyword inlet also occurs many times.

I need a way to find pattern that first search for key word inlet and then search the closest { and }.

Anyone knows how to do it smartly?

look for awk solutions that have `flag` variables. Several appear here every week. ie. `'/type/{t=1};/value/{v=1}; {t && v}' file` (may not be exactly right, hence posted as a comment). Good luck. — shellter, Apr 05 '13 at 03:24

Ed Morton · Accepted Answer · 2013-04-05T14:02:53.097

2

Since you didn't provide expected output for the input you posted we're just guessing at what you want output but how about this in GNU awk:

$ cat tst.awk
BEGIN{ RS="\0" }
{
   print "inlet:",  gensub(/.*\yinlet\y[^}]*type\s+(\w+).*/,"\\1","")
   print "outlet:", gensub(/.*\youtlet\y[^}]*type\s+(\w+).*/,"\\1","")
}
$ gawk -f tst.awk file
inlet: fixedValue
outlet: inletOutlet

Explanation:

RS="\0"

= set the Record Separator to the Null string so awk reads the whole file as a single record.

gensub(/.*\yinlet\y[^}]*type\s+(\w+).*/,"\\1","")

= look for the word inlet followed by any characters except a } (so you stop before the first } after inlet instead of the last } in the file) and then the word type followed by white space. The alpha-numeric string after that (\w+) is the word you want printed so remember it and then replace the whole record with just that string as saved in \\1.

Setting RS="\0" and gensub() are both gawk-specific.

edited Apr 05 '13 at 14:02

answered Apr 05 '13 at 13:45

Ed Morton

188,023
17
78
185

Wow, my hat's off. Would you please add a few explanations to the syntax? :) – Daniel Apr 05 '13 at 13:52
1

@Daniel - explanations added. See also the gawk manual, http://www.gnu.org/software/gawk/manual/gawk.html. If you're going to be doing text file manipulation like this, I'd strongly recommend you get the book Effective Awk Programming, Third Edition By Arnold Robbins. – Ed Morton Apr 05 '13 at 14:04
Thank you so much. I am new to scripting, I have a big difficulty in choosing between `sed` and `awk`? Which is more powerful or more flexible or has a good and rewarding learning curve? Thanks – Daniel Apr 05 '13 at 14:09
1

Like all UNIX tools, you should use both tools as appropriate. sed is an excellent tool for simple substitutions on a single line, for any other text manipulation you should use awk. You can do a LOT of things in sed that you should NEVER do - that's because sed predated awk by a few years so it has a ton of language constructs to do things in ridiculously complicated ways just because back in the day (early 1970s!) there was no simpler alternative. The only commands you should use in sed are s, g, p (with -n), and d. – Ed Morton Apr 05 '13 at 14:16
1

By the way, being completely honest - best I can tell the reason experienced people still post sed solutions using more than those 4 constructs is because they simply enjoy the challenge of solving the problem with sed. I get that, kinda, but I wish they wouldn't as it misleads newcomers into thinking that's a reasonable approach. – Ed Morton Apr 05 '13 at 14:28

score 1 · Answer 2 · answered Apr 05 '13 at 03:31

1

Can you use perl?

#!/usr/bin/env perl

use strict;
use warnings;

my $filename = $ARGV[0];

open(my $f, '<', $filename) or die "Unable to open $filename: $!\n";
my $string = do { local($/); <$f> };
close($f);

$string =~ /(inlet).*type\s*(\w+).*(outlet).*type\s*(\w+)/s;
print "$1: $2\n$3: $4\n";

answered Apr 05 '13 at 03:31

Timothy Brown

2,220
18
22

Thanks a lot! but no shell scripts can do this? – Daniel Apr 05 '13 at 03:52
UNIX shell is an environment from which to call tools. perl is a tool just like sed, grep, awk, etc. The only difference is that unlike the other tools I mentioned, perl doesn't come with all UNIX installations. – Ed Morton Apr 05 '13 at 13:47

score 1 · Answer 3 · answered Apr 05 '13 at 07:41

1

This might work for you (GNU sed):

sed -rn '/^\s*(inlet|outlet)/,/^\s*}/!b;/type/s/.*\s(\S+);.*/\1/p' file

If you narrow the search for 'type' between either 'inlet' and the next '}' or 'outlet' and the next '}' this makes the whole exercise easier.

answered Apr 05 '13 at 07:41

potong

55,640
6
51
83

How to find the multiline pattern match (they must be first time match)?

3 Answers3

Linked