2

I am working on installing Kylin on AWS EMR via shell script . I have an xml file with below content from which I need to copy particular document element to another xml file . This is the manual step I need to automate using shell commands while running the installation shell script.

/etc/hbase/conf/hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>ip-nn-nn-nn-nn.ec2.internal</value>
  </property>

  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://ip-nn-nn-nn-nn.ec2.internal:xxxx/user/hbase</value>
  </property>

  <property>
    <name>dfs.support.append</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.rest.port</name>
    <value>xxxx</value>
  </property>
</configuration>

I need to copy hbase.zookeeper.quorum property from /etc/hbase/conf/hbase-site.xml to $KYLIN_HOME/conf/kylin_job_conf.xml,like this:

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>ip-nn-nn-nn-nn.ec2.internal</value>
</property>

Note : $KYLIN_HOME/conf/kylin_job_conf.xml already contains some other data inside.

Needs to copy the output to Target file.

Target file "$KYLIN_HOME/conf/kylin_job_conf.xml" looks like this:

<configuration>

    <property>
        <name>mapreduce.job.split.metainfo.maxsize</name>
        <value>-1</value>
        <description>The maximum permissible size of the split metainfo file.
            The JobTracker won't attempt to read split metainfo files bigger than
            the configured value. No limits if set to -1.
        </description>
    </property>

    <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
        <description>Compress map outputs</description>
    </property>

    <property>
        <name>mapreduce.output.fileoutputformat.compress</name>
        <value>true</value>
        <description>Compress the output of a MapReduce job</description>
    </property>

    <property>
        <name>mapreduce.output.fileoutputformat.compress.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
        <description>The compression codec to use for job outputs
        </description>
    </property>

    <property>
        <name>mapreduce.output.fileoutputformat.compress.type</name>
        <value>BLOCK</value>
        <description>The compression type to use for job outputs</description>
    </property>

    <property>
        <name>mapreduce.job.max.split.locations</name>
        <value>xxxx</value>
        <description>No description</description>
    </property>

    <property>
        <name>dfs.replication</name>
        <value>xxx</value>
        <description>Block replication</description>
    </property>

    <property>
        <name>mapreduce.task.timeout</name>
        <value>xxxx</value>
        <description>Set task timeout to 1 hour</description>
    </property>

</configuration>

Expected Output:

<configuration>

    <property>
        <name>mapreduce.job.split.metainfo.maxsize</name>
        <value>-1</value>
        <description>The maximum permissible size of the split metainfo file.
            The JobTracker won't attempt to read split metainfo files bigger than
            the configured value. No limits if set to -1.
        </description>
    </property>

    <property>
        ---------
        ---------
        ---------
    </property>

    <property>
        ---------
        ---------
        ---------
    </property>

    <property>
        ---------
        ---------
        ---------
    </property>

    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>ip-nn-nn-nn-nn.ec2.internal</value>
    </property>

</configuration>

Is there any shell command which can fetch the particular document element from the above xml file and copy it to another xml file automatically.

I have tried the below command:

awk 'NR == FNR { if(FNR >= 30 && FNR <= 33) { patch = patch $0 ORS }; next } FNR == 88 { $0 = patch $0 } 1' /etc/hbase/conf/hbase-site.xml $KYLIN_HOME/conf/kylin_job_conf.xml > $KYLIN_HOME/conf/kylin_job_conf.xml

the above command didn't work for me.can someone help me how to resolve this?

Sai
  • 25
  • 4

1 Answers1

2

It is rarely a good idea to try to query XML files with RegEx'es.
Always prefer to use an XML parser!

So you can achieve your given task with xmlstarlet. It is a single program that can get you the data you want in one command from your input ("input.xml"):

xmlstarlet sel -t -c "/configuration/property[name='hbase.zookeeper.quorum']" input.xml

Its output is:

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>ip-nn-nn-nn-nn.ec2.internal</value>
</property>

If you do not have installed on your machine, do

sudo apt-get -y install xmlstarlet

The command line options are:

  • sel: Select data or query XML document(s) (XPATH, etc)
  • -t : Template mode: interpret the following commands for templates
  • -c : print copy of the following XPATH expression

Now, in a second step, copy the resulting XML to the target file. This is possible with the method described in this SO answer: "How do I use xmlstarlet to append xml files with multiple sub node?"

Applied to your example, the following command line achieves what you want:

xmlstarlet ed -a "/configuration/property[last()]" -t elem -n property \
-v "$(xmlstarlet sel -t -c "/configuration/property[name='hbase.zookeeper.quorum']/*" input.xml)" \
target.xml | xmlstarlet unesc | xmlstarlet fo > new_target.xml

The result in new_target.xml is

<?xml version="1.0"?>
<configuration>
  <property>
    <name>mapreduce.job.split.metainfo.maxsize</name>
    <value>-1</value>
    <description>The maximum permissible size of the split metainfo file.
            The JobTracker won't attempt to read split metainfo files bigger than
            the configured value. No limits if set to -1.
        </description>
  </property>
  <property>
    <name>mapreduce.map.output.compress</name>
    <value>true</value>
    <description>Compress map outputs</description>
  </property>

  ...

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>ip-nn-nn-nn-nn.ec2.internal</value>
  </property>
</configuration>

However, this method has one disadvantage: it unescapes all entities in the target file (with the xmlstarlet unesc command), so entities like &amp; will be converted to &... This may break things.

If this is a problem, consider using a solution with a full XSLT processor and a stylesheet.

zx485
  • 28,498
  • 28
  • 50
  • 59
  • Thanks for the command.I have tried the suggested command and it is working but I need to paste the output in another xml as you have suggested I tried with xmlstarlet using "ed -i " to insert the output in another xml file.I have little knowledge on xmlstarlet I am not successful.Can you suggest me how to paste the result in another xml file any command which would solve my problem? – Sai May 08 '20 at 13:56
  • By your sample given in the question, you could use output redirection like `xmlstarlet ... > output.xml`. But I'm not sure that this is what you want. I need to know the structure of the target XML. But I guess you were on the right track with `xmlstarlet ed -i ...`. Pipe the output from the above command to this one. Edit your question with the output XML. – zx485 May 08 '20 at 17:16
  • I have edited the question with the output XML File and the expected output in the XML File.I tried the command `sudo bash -c "xmlstarlet sel -t -c "/configuration/property[name='hbase.zookeeper.quorum']" /etc/hbase/conf/hbase-site.xml > $KYLIN_HOME/conf/kylin_job_conf.xml"` when I check the output file i.e., $KYLIN_HOME/conf/kylin_job_conf.xml the existing data is getting removed and I could find empty file. I'm not sure why this is happening in my case and I have tried with `xmlstarlet ed -i..` with my basic understanding but I couldn't form a structured query.Please help me with this – Sai May 09 '20 at 20:08
  • Thanks for updating with the new command.It is working fine with my case. – Sai May 12 '20 at 22:48