modifying working regex to work with g/awk

Question

Ive a working regex pattern:

^\s+$\n^([ \t]+)Summary.*(?:\n\1[ \t]*\S.*)+

Designed to match an entire paragraph that starts with the word "Summary", as per this question.

I am now seeking to have this work with gawk, e.g.

gawk '/^\s+$\n^([ \t]+)Summary.*(?:\n\1[ \t]*\S.*)+/{print}'

But the above statement is returning nothing.

As an alternative I can use

gawk /Summary/ myfile.txt

Which returns a single line of the paragraph that contains the word 'Summary'. Presumably I can use the RS variable to specify the record separator.

There is no such thing as a standalone `working regexp`. Every tool supports different regexp flavors with it's own caveats/extensions. I assume you think your regexp works because you've tested it with some online tool but that just proves it works with that online tool, not that it works with any specific command-line tool. The regexp you show will not work with any standard UNIX tool, nor will it work with GNU awk nor GNU sed. [edit] your question to include concise, testable sample input and expected output and we can help you solve your problem (using a regexp or otherwise). — Ed Morton, Sep 01 '17 at 11:58
Ed: I appreciate that regex's have different environments, hence the very clear question - its about modifying a regex that works in one environment so that it can work in another, in this case awk. Ive provided a link that demonstrates the functionality of the regex. — haz, Sep 01 '17 at 12:24
Even the answer you posted yourself isn't a modified version of that regexp, it's a different solution to the problem described in the linked question that uses 2 separate regexps and a range expression instead of a single regexp. I did glance at the other question but YMMV if you expect others to do so. Are you looking for a modified version of that regexp or are you looking for a solution to your problem using awk? In any case, if you include concise, testable sample input and expected output in your question before it gets voted closed as unclear then no doubt you'll get a good answer. — Ed Morton, Sep 01 '17 at 12:34
As per the question I am looking for a way to adapt the regex from one environment to another. — haz, Sep 01 '17 at 12:42

score 0 · Answer 1 · answered Sep 01 '17 at 07:34

0

Better you use below one, rather than range expression, you may read more about range expression discussed by Ed Morton and Scrutinizer from here

awk '/Summary/{f=1} f{print; if (/RefSeq/) f=0}' yourfile.txt

answered Sep 01 '17 at 07:34

Akshay Hegde

16,536
2
22
36

1

Right and there's probably an even simpler solution which we could help with once the OP provides sample input output. – Ed Morton Sep 01 '17 at 12:02
@EdMorton: Yeah, true, but OP says [this](https://stackoverflow.com/questions/45973353/regex-to-select-entire-paragraph-by-matching-word-in-first-line) is his input, I think post can be merged if so – Akshay Hegde Sep 01 '17 at 12:06
Ah, the joy of clicking through posts to piece together a question :-). If the OP updates his question to be standalone I (and probably others) will take a look. – Ed Morton Sep 01 '17 at 12:10

haz · Accepted Answer · 2017-09-02T03:06:38.953

-1

For my particular purpose I was aiming to capture a multi-line paragraph which began with the word "Summary" and ended with content in square brackets. I was able to use the following gawk statement, known as a range pattern, to precisely replicate the regex in question:

gawk /Summary/,/\]/{print}  myfile.txt

        ^       ^
range:start    end
    "Summary"  "]"

Note the escaped square bracket. This statement matches everything bounded by (and including) Summary and ].

see also this question

Whilst this is not an answer to the question- modify a regex working in one environment to work in awk - it is a work-around solution to the problem (particularly in the absence of any other input).

edited Sep 02 '17 at 03:06

answered Sep 01 '17 at 06:12

haz

740
1
11
20

unclear as to why this is downvoted. this is an alternative solution to the problem. – haz Sep 01 '17 at 12:26
1) It will not produce the posted expected output given the posted sample input from [your referenced question](https://stackoverflow.com/questions/45973353/regex-to-select-entire-paragraph-by-matching-word-in-first-line) and 2) a range expression is never the best solution. – Ed Morton Sep 01 '17 at 12:30
Ed its the best solution I was able to come up with and chose to share my findings with the community. Can you also share a solution? – haz Sep 01 '17 at 12:37
I'm sure if you edit your question to include concise, testable sample input and expected output (i.e. a [mcve]) and clarify if you're looking for a modified regexp specifically or simply a working awk (or other tool?) script then you will get multiple solutions. See [ask] if that's not clear. – Ed Morton Sep 01 '17 at 12:38
I provided the regex I'm seeking to modify and - for brevity - a link to another question that refers to the exact same regex – haz Sep 01 '17 at 12:43
Yes you did so now if that other question changes or is closed then no-one will know to come modify this one too and so others looking for solutions in future won't be able to use this fragmented Q&A. So, good luck with that if you're not willing to just create a standalone Q&A here. – Ed Morton Sep 01 '17 at 12:44

modifying working regex to work with g/awk

2 Answers2