0

I have a flat file like this:

File: 
# Environment
Application.Env~DEV
# Identity
Application.ID~999
# Name
Application.Name~appname

An XML like this:

<name>Application/Env</name>
<value>XXX</value>
<name>Application/ID</name>
<value>000</value>
<name>Application/Name</name>
<value>AAA</value>

I'm looking for a script (awk, sed etc) to read the flat file and replace all of the data in the <value> tags in the xml with the data found after the ~ when the <name> tag matches the data before the ~. Ultimately the resulting XML will look like:

    <name>Application/Env</name>
    <value>DEV</value>
    <name>Application/ID</name>
    <value>999</value>
    <name>Application/Name</name>
    <value>appname</value>

Thanks for your help!

ocbit
  • 41
  • 1
  • 2
  • 5
  • BTW, your "XML like this" isn't actually good enough to validate correctness with. The header is important -- if the XML file you're processing starts with ``, that means something completely different than if it just started with ``. – Charles Duffy Jun 23 '16 at 21:01
  • ocbit: `sed`, `awk`, etc. can't deal with XML reliably -- the syntax is *not at all* context-free, meaning you need to keep track of which `xmlns` attributes have been seen in prior tags, whether you're in a comment, whether you're in a CDATA section, &c. to decide what to do at any given time. (And that's before dealing with things like entity expansion, or values that need to be escaped in order to avoid breaking the syntax). See also the related http://stackoverflow.com/a/1732454/14122 – Charles Duffy Jun 23 '16 at 21:26

2 Answers2

4

Using XMLStarlet, this would look something like the following:

#!/bin/bash

# usage: [script] [flatfile-name] <in.xml >out.xml
flatfile=$1

# store an array of variables, and an array of edit commands
xml_vars=( )
xml_cmd=( )
count=0

while read -r line; do
  [[ $line = *"~"* ]] || continue
  key=${line%%"~"*}   # put everything before the ~ into key
  key=${key//"."/"/"} # change "."s to "/"s in key
  val=${line#*"~"}    # put everything after the ~ into val

  # assign key to an XMLStarlet variable to avoid practices that can lead to injection
  xml_vars+=( --var "var$count" "'$key'" )

  # update the first value following a matching name
  xml_cmd+=( -u "//name[.=\$var${count}]/following-sibling::value[1]" \
             -v "$val" )

  # increment the counter used to assign variable names
  (( ++count ))
done <"$flatfile"

if (( ${#xml_cmd[@]} )); then
  xmlstarlet ed "${xml_vars[@]}" "${xml_cmd[@]}"
else
  cat # no edits to do
fi

This will run a command like the following:

xmlstarlet ed \
  --var var0 "Application/Env" \
  --var var2 "Application/ID"  \
  --var var3 "Application/Name" \
  -u '//name[.=$var0]/following-sibling::value[1]' -v 'DEV' \
  -u '//name[.=$var1]/following-sibling::value[1]' -v '999' \
  -u '//name[.=$var2]/following-sibling::value[1]' -v 'appname'

...which replaces the first value after the name Application/Env with DEV, the first value after the name Application/ID with 999, and the first value after the name Application/Name with appname.


A slightly less paranoid approach might instead generate queries like //name[.="Application/Name"]/following-sibling::value[1]; putting the variables out-of-band is being followed as a security practice. Consider what could happen otherwise if the input file contained:

Application.Foo"or 1=1 or .="~bar

...and the resulting XPath were

//name[.="Application/Foo" or 1=1 or .=""]/following-sibling::value[1]

Because 1=1 is always true, this would then match every name, and thus change every value in the file to bar.

Unfortunately, the implementation of XMLStarlet doesn't effectively guard against this; however, using bind variables makes it possible for an implementation to provide such precautions, so a future release could be safe in this context.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
1

Using Perl and XML::XSH2, a wrapper around XML::LibXML:

#!/usr/bin/perl
use warnings;
use strict;
use XML::XSH2;

open my $IN, '<', 'flatfile' or die $!;
$XML::XSH2::Map::replace = { map { chomp; split /~/ } grep /~/, <$IN> };

xsh << 'end.';
    open 1.xml ;
    for //name {
        set following-sibling::value[1]
            xsh:lookup('replace', xsh:subst(., '/', '.')) ;
    }
    save :b ;
end.

I wrapped the XML into a <root> tag to make it well formed.

choroba
  • 231,213
  • 25
  • 204
  • 289
  • Bravo for real XML-parser-based solutions. :) – Charles Duffy Jun 23 '16 at 21:13
  • @choroba thanks for the help but my shell doesn't support XML::XSH2 - Can't locate XML/XSH2.pm in INC. Any chance this script can be written using purely awk or sed even though it may not be reliable? My xml will only have name/value pairs like in my example. – ocbit Jun 27 '16 at 19:15
  • @ocbit: You should be able to install it via `cpan XML::XSH2`. – choroba Jun 27 '16 at 19:19
  • @choroba: Unfortunately no chance since this is a work computer. – ocbit Jun 27 '16 at 19:36