-1

I have a standard .Net .csproj xml file that contains xml tags and text between them like this:

<lorem>ipsum</lorem>
<PackageReleaseNotes>Some information.

- Some more information.</PackageReleaseNotes>
<lorem>ipsum</lorem>

I need a bash command that will extract the text, newlines and all, between the <PackageReleaseNotes> and </PackageReleaseNotes> tags.

I came up with cat Useful.String.Extensions.csproj | grep -o -P '(?<=PackageReleaseNotes>).*(?=</PackageReleaseNotes>)' and it works if the text between the tags does not have newlines. But for the case I used as an example, it returns nothing.

J. Doe
  • 99
  • 1
  • 6
  • 2
    Use a tool like `xmlstarlet` for processing XML files. – Barmar Apr 02 '20 at 20:55
  • 1
    [You can't parse \[X\]HTML with regex](http://stackoverflow.com/a/1732454/3776858). I suggest to use an XML/HTML parser (xmlstarlet, e.g.). – Cyrus Apr 02 '20 at 20:58
  • I don't know why everyone says you can't parse xml or whatever with regex. The accepted answer is EXACTLY what I am looking for. – J. Doe Apr 02 '20 at 21:55

1 Answers1

0
cat Useful.String.Extensions.csproj | grep -Pzo '(?<=PackageReleaseNotes>)(.|\n)*(?=</PackageReleaseNotes>)'
-z/--null-data 
Treat input and output data as sequences of lines.
vtronko
  • 478
  • 3
  • 10