Get text with newlines between two words in bash

Question

I have a standard .Net .csproj xml file that contains xml tags and text between them like this:

<lorem>ipsum</lorem>
<PackageReleaseNotes>Some information.

- Some more information.</PackageReleaseNotes>
<lorem>ipsum</lorem>

I need a bash command that will extract the text, newlines and all, between the <PackageReleaseNotes> and </PackageReleaseNotes> tags.

I came up with cat Useful.String.Extensions.csproj | grep -o -P '(?<=PackageReleaseNotes>).*(?=</PackageReleaseNotes>)' and it works if the text between the tags does not have newlines. But for the case I used as an example, it returns nothing.

[You can't parse \[X\]HTML with regex](http://stackoverflow.com/a/1732454/3776858). I suggest to use an XML/HTML parser (xmlstarlet, e.g.). — Cyrus, Apr 02 '20 at 20:58
I don't know why everyone says you can't parse xml or whatever with regex. The accepted answer is EXACTLY what I am looking for. — J. Doe, Apr 02 '20 at 21:55

score 0 · Accepted Answer · answered Apr 02 '20 at 21:02

0

cat Useful.String.Extensions.csproj | grep -Pzo '(?<=PackageReleaseNotes>)(.|\n)*(?=</PackageReleaseNotes>)'

-z/--null-data 
Treat input and output data as sequences of lines.

answered Apr 02 '20 at 21:02

vtronko

478
3
10

What does `-z/--null-data Treat input and output data as sequences of lines.` mean? – J. Doe Apr 02 '20 at 21:50
Why the `cat` ? – Jetchisel Apr 02 '20 at 22:36
It was this way in the OP, I don't know how exactly OP wants to tweak it, hence I only applied change to the second part. You can pass filename to grep, of course. – vtronko Apr 02 '20 at 22:40

Get text with newlines between two words in bash

1 Answers1