0

I'm trying to do a invert match of an html multi line p tag block with grep.

It works non inverted:

grep -Pz '(?s)\s*<p id="internal_version">.*?</p>\s' test.html

but gives exit code 1 if inverted

grep -Pzv '(?s)\s*<p id="internal_version">.*?</p>\s' test.html

I would have expected to get the not matching content of the file.

Example test.html:

<!DOCTYPE html>
<html lang="en">
  <body>
    <div id="page-content-wrapper">
        <div class="container-fluid">
            <p id="internal_version">
                <font size="+1" color="red"><strong>Attention:</strong> <br>
                Text <br> 
                More Text 123 !%&/
            </p>
        </div>
    </div>
</body>
</html>
Inian
  • 80,270
  • 14
  • 142
  • 161
Nils
  • 382
  • 3
  • 12
  • 1
    Use a proper syntax aware parser like `xmllint`/`xmlstarlet` than `grep` – Inian Jun 22 '20 at 08:48
  • All-time classic https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – bipll Jun 22 '20 at 08:52
  • @bipll reading the two most up voted answers it looks like I entered some sort of war zone between "don't use regex for html!!111" and "It's some times fine". I will have a look at xmllint/xmlstarlet @Inian But still im wondering why `grep -Pz` works and `grep -Pzv` doesn't – Nils Jun 22 '20 at 09:01
  • 2
    Exactly for the same reason why would `echo 123 | grep -v 2` print nothing and return 1. When you add `-z` your whole file is treated as a single line, so if you invert match, its content is not matching anymore (as it matches the pattern that was inverted). – bipll Jun 22 '20 at 09:06

0 Answers0