How to extract the content between CDATA in the example below using sed
(or another easy method) ?
The tricky thing is that the pattern must be evaluated on multiple lines, and also one part of the line must be kept in extracted result... so I expected some powerful tools like sed
or awk
to be able extract content of a file using a capturing regular expression .. without success !
Example of content:
<XmlBox className="com.example.ConfigData">
<xmlString><![CDATA[<ConfigData>
<myField>Here we go:
Yup.
</myField>
</ConfigData>]]></xmlString>
</XmlBox>
<XmlBox className="com.example.ServiceDefinition">
<xmlString><![CDATA[<ServiceDefinition>
<name>Tricky?</name>
</ServiceDefinition>]]></xmlString>
</XmlBox>
Expected result:
<ConfigData>
<myField>Here we go:
Yup.
</myField>
</ConfigData>
<ServiceDefinition>
<name>Tricky?</name>
</ServiceDefinition>
The related regular expression to capture it would be:
(?s)<XmlBox className=".+?">\s+<xmlString><!\[CDATA\[(.+?)\]\]></xmlString>\s+</XmlBox>
But HOW to automate this in a simple bash command ?? I thought it was so easy, isn't it ?