0

I'm trying to create a shell script that would read through code and pull out sections.

An example of the code would be:

method_name1()
  [Marker] CODE-1234
    # Comment 1
    # Comment 2
  code...

method_name2()
  code...

method_name3()
  [Marker] CODE-456 possible other text
    # Comment 1
  code...

method_name4()
  [Marker] CODE-ABC
  code...

So what I'm trying to do is grep through all the source code looking for [Marker] CODE- and then also collect all of the following lines that start with some whitespace followed by a pound comment.

Not all methods would have a marker, not all markers have comments, and there can be one or more lines that make up the comment.

There is the grep parameter -A which gets lines after a match, but it is always for a specific number of lines, whereas this code could have zero to a hundred lines of comments after.

I've thought about trying -B 1 to get the method name and -A 5 to just always get 5 lines and then separately parse them for the # symbol, but I just think there should be another/cleaner way.

The end result would be a simple to read table such as:

| method       | marker    | comments                           |
|--------------|-----------|------------------------------------|
| method_name1 | CODE-1234 | Comment 1. Comment 2.              |
| method_name3 | CODE-456  | (possible other text) Comment 1.   |
| method_name4 | CODE-ABC  | --                                 |

So it is possible to "double grep"?

MivaScott
  • 1,763
  • 1
  • 12
  • 30
  • If posix grep had multi-line flag it'd be easy... Do you have the option of using `pcregrep` or `sed` from [these answers](https://stackoverflow.com/questions/2686147/how-to-find-patterns-across-multiple-lines-using-grep)? Edit: [-z will put it all on one line](https://stackoverflow.com/a/43398476/5514077) so you can make a multiline regex. – TheWandererLee Aug 15 '19 at 18:15

1 Answers1

1

This one will search for exactly Marker, match that line, and all lines after it starting with some spaces followed by #. It will stop a match when it reaches the next [ or a line not started with #.

Notice that it does not match the final # And this line because it came after a non-commented (#) line.

Here's my regex and it's output:

$ grep -Pzo '\[Marker\].*(\n *#.*)*[^\[]' test

1:[Marker] CODE-1234 possible other text
    # Comment 1
    # Comment 2

1:[Marker] Test
    # This

Here's the test file:

[Marker] CODE-1234 possible other text
    # Comment 1
    # Comment 2

[Mark] Test
    # This

[Marker] Test
    # This
    Not this
    # And this
TheWandererLee
  • 1,012
  • 5
  • 14