Print text between two non-static strings

Question

I want to automate a script that parses an XML file and copy a section of it. I searched and found a way to do that but its working only with fields like

<title> .... </title>

My aim is to copy this

<datasource enabled="true" jndi-name="java:/db_namePostgresDS" jta="true" pool-name="db_namePostgresDS" spy="false" use-ccm="false" use-java-context="true">
    THINGS AND FIELDS IN HERE
</datasource>

and paste it just after </datasource>. Then I will change values with sed. But I basically want to double that section.

I just can't get how to do it, and maybe it's an XY Problem. Any help?

EXAMPLE:

I have

<datasource enabled="true" jndi-name="java:/db_namePostgresDS" jta="true" pool-name="db_namePostgresDS" spy="false" use-ccm="false" use-java-context="true">
THINGS AND FIELDS IN HERE

and I want to have

<datasource enabled="true" jndi-name="java:/db_namePostgresDS" jta="true" pool-name="db_namePostgresDS" spy="false" use-ccm="false" use-java-context="true">
    THINGS AND FIELDS IN HERE
</datasource>

<datasource enabled="true" jndi-name="java:/MODIFIED_NAME_HERE_PostgresDS" jta="true" pool-name="db_namePostgresDS" spy="false" use-ccm="false" use-java-context="true">
    MODIFIED THINGS AND FIELDS IN HERE
</datasource>

Important: I need to avoid installing new software on the machine (explicit customer request). XML parsers, if not built-in, aren't the way.

i should have specified it, i'll do modifying the post...have to avoid installing software on the machine (customer request) — Wyatt Gillette, Sep 25 '17 at 13:00
Since it looks like you're parsing a standalone/domain.xml from Wildfly or JBoss EAP you're actually pretty safe without using an XML parser since these tools reformat their configuration at launch. It's still much more hassle parsing text than directly XML though. — Aaron, Sep 25 '17 at 13:30
The software on the machine doesn't contain `xsltproc`? It's a *very* widespread/standard tool. Same with, say, Python with the `ElementTree` standard-library module. Thus, I'm skeptical of the claim that there's no XML parser already on your target machine. — Charles Duffy, Sep 25 '17 at 15:29
(So if you have a XMLStarlet-based answer, you can use the `-C` argument to tell it to compute an XSLT template you can then apply with `xsltproc` anywhere with just the usual/basic set of packages installed). — Charles Duffy, Sep 25 '17 at 15:30
that's interesting. It seems that CentOS 7 ships xsltproc natively, even in the Core edition. BTW I don't know how to use it so I'll search, tough that sed one-liner works very well, especially when using xmllint (also shipped with CentOS to reformat output) — Wyatt Gillette, Sep 26 '17 at 07:59

Aaron · Accepted Answer · 2017-09-25T15:25:35.110

1

I would use sed to extract the multiline xml tag :

orig_datasource=$(sed -n '/<datasource/{: l;N;/<\/datasource>/!bl;p}' your_input_file)

This command starts aggregating lines once it encounters the opening <datasource tag and prints the result once it has aggregated up to the closing </datasource> tag. *

The XML tag would be captured in a orig_datasource variable that I could then both use as-is and modified :

modified_datasource=$(echo "$orig_datasource" | sed 's/something/else/');
echo "$orig_datasource

$modified_datasource" > target_file

* : There are a lot of ways it could fail (i.e. < datasource> is a valid tag opening that wouldn't be understood as such by the sed command), but since it looks like you're working on a configuration file from JBoss EAP or Wildfly you should be safe since these tools reformat their configuration file at launch. Still, it's safer and easier to use an XML parser when possible than to parse the data as text.

edited Sep 25 '17 at 15:25

answered Sep 25 '17 at 13:36

Aaron

24,009
2
33
57

actually that command just stored the whole file in $orig_datasource... (yes, I'm using Wildfly8.1) – Wyatt Gillette Sep 25 '17 at 13:58
@WyattGillette I've fixed a mistake in the sed command, can you try it again? – Aaron Sep 25 '17 at 14:15
yep that's almost working! It finds also the duplicates so I will have to find a way to make it stop after the first closing match...what would be the best way to do it?...but that's a huge step forward! (just to understand...what was the mistake?) EDIT: so simple, added 'q' after 'p' :) Upvote granted and thnx a lot! – Wyatt Gillette Sep 25 '17 at 14:41
The mistake was that I wrote "when you encounter the closing tag, print the content of your hold buffer, then anyway jump to the start of the aggregating loop". That meant that after succesfully printing a datasource, everything that followed was also aggregated as if it was inside a datasource tag. Now I've instead wrote "as long as you don't encounter the closing tag carry on aggregating, then (otherwise) print what you've aggregated", which correctly exits the loop at the end of a datasource tag. – Aaron Sep 25 '17 at 14:48
Amen. That's it. Ty – Wyatt Gillette Sep 25 '17 at 15:07
Usage of regexes to process an xml file is a terrible solution. It will work on simple use cases but `\` (with an escaped `<`) is not an xml tag but mere text, and tags can be nested, so no regex way can be robust. It may be fine here (so I didn't downvote) but you **really** should warn future readers of the possible caveats... – Serge Ballesta Sep 25 '17 at 15:20
@SergeBallesta I felt my last paragraph combined with the comments on OP's question were enough, but I've added a note explicitly mentioning the use of an XML parser regardless. It's quite safe in this context because the XML is XSD-validated and reformated, so most commons pitfalls don't apply (no possible nested tag, comments containing XML, extra spacing in tags, lack of linefeed, etc. ; I think in your example the escaped `<` would be written as `<`) – Aaron Sep 25 '17 at 15:30
Of course I would have preferred to answer with a simple `//ns:datasource` XPath ;) – Aaron Sep 25 '17 at 15:34
@Aaron: I feel it better now (and I will delete my comments in a while...) – Serge Ballesta Sep 25 '17 at 15:37

Print text between two non-static strings

1 Answers1