-1

Need help to extract the start to end tag of xml code when a pattern is matched. For example, I have this in my xml file:

      <entry>
    <log_time>20150618-00:06:30</log_time>
    <description><![CDATA[Connection established]]></description>
    <service>SSH</service>
    <sessionid>02881141</sessionid>
    <type>0</type>    <severity>0</severity>
    <lstnconnaddr>10.10.10.100:22</lstnconnaddr>
    <cliconnaddr>10.10.11.201:63530</cliconnaddr>
    <sguid>04AD6AD5-FB2E-4F03-7993-447648CC3EED</sguid>
  </entry>
  <entry>
    <log_time>20150618-00:06:30</log_time>
    <description><![CDATA[Sent server version: SSH-2.0-0]]></description>
    <service>SSH</service>
    <sessionid>08878297</sessionid>
    <type>0</type>    <severity>1</severity>
    <lstnconnaddr>10.10.10.100:22</lstnconnaddr>
    <cliconnaddr>10.10.11.201:63529</cliconnaddr>
    <sguid>04AD6AD5-FB2E-4F03-7993-447648CC3EED</sguid>
  </entry>
  <entry>
    <log_time>20150616-00:00:00</log_time>
    <description><![CDATA[SSH Transport agreed algorithms
Key exchange algorithm: diffie-hellman-group14-sha1
Server host key algorithm: ssh-rsa
Client encryption algorithm: aes256-ctr
Client MAC algorithm: hmac-sha1
Client compression algorithm: none
Client language: 
Server encryption algorithm: aes256-ctr
Server MAC algorithm: hmac-sha1
Server compression algorithm: none
Server language: 
]]></description>
    <service>SSH</service>
    <sessionid>48018549</sessionid>
    <type>0</type>    <severity>1</severity>
    <lstnconnaddr>10.10.10.100:22</lstnconnaddr>
    <cliconnaddr>10.10.11.201:60580</cliconnaddr>
    <sguid>04AD6AD5-FB2E-4F03-7993-447648CC3EED</sguid>
  </entry>

My pattern will be the client IP - 10.10.11.201 in this example. I have certain IPs to look for in multiple xml files and the tags are not uniform, some have more lines than the others - for this reason, I cannot use "grep" with -B or -A, hence, the basis should be the start-tag <> to end-tag </> to get the entire transaction of that IP.

Let me try to better put what I'm looking for. For example, I'm looking for lines with 10.10.11.201:

<cliconnaddr>10.10.11.201:63529</cliconnaddr>

When this is found, I need the entire start-end tag:

  <entry>
    <log_time>20150618-00:06:30</log_time>
    <description><![CDATA[Sent server version: SSH-2.0-0]]></description>
    <service>SSH</service>
    <sessionid>08878297</sessionid>
    <type>0</type>    <severity>1</severity>
    <lstnconnaddr>10.10.10.100:22</lstnconnaddr>
    <cliconnaddr>10.10.11.201:63529</cliconnaddr>
    <sguid>04AD6AD5-FB2E-4F03-7993-447648CC3EED</sguid>
  </entry>

Preferably using bash, awk, sed, perl.

Thanks!

  • It's not clear what your desired output should be. There are a lot of tags in there, and also a lot of IPs. What exactly do you want, and have you given it a try? Most likely [you do not want to parse XML with regex](http://stackoverflow.com/a/1732454/1331451). Use a parser instead. – simbabque Jan 13 '16 at 07:30
  • simbabque, I updated the post to better understand what I'm looking for. thanks! – jefrey cayab Jan 13 '16 at 07:47

2 Answers2

2

You can use XML::Twig to do that. Basically this creates a handler that will be invoked for every cliconnaddr element, grab the parent and print it.

use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig->new(
    twig_handlers => {
        cliconnaddr => sub { say $_->parent->toString if $_->text eq '10.10.11.201:63529' }
    }
);
$twig->parse( \*DATA );

__DATA__
<root>
      <entry>
    <log_time>20150618-00:06:30</log_time>
    <description><![CDATA[Connection established]]></description>
    <service>SSH</service>
    <sessionid>02881141</sessionid>
    <type>0</type>    <severity>0</severity>
    <lstnconnaddr>10.10.10.100:22</lstnconnaddr>
    <cliconnaddr>10.10.11.201:63530</cliconnaddr>
    <sguid>04AD6AD5-FB2E-4F03-7993-447648CC3EED</sguid>
  </entry>
  <entry>
    <log_time>20150618-00:06:30</log_time>
    <description><![CDATA[Sent server version: SSH-2.0-0]]></description>
    <service>SSH</service>
    <sessionid>08878297</sessionid>
    <type>0</type>    <severity>1</severity>
    <lstnconnaddr>10.10.10.100:22</lstnconnaddr>
    <cliconnaddr>10.10.11.201:63529</cliconnaddr>
    <sguid>04AD6AD5-FB2E-4F03-7993-447648CC3EED</sguid>
  </entry>
  <entry>
    <log_time>20150616-00:00:00</log_time>
    <description><![CDATA[SSH Transport agreed algorithms
Key exchange algorithm: diffie-hellman-group14-sha1
Server host key algorithm: ssh-rsa
Client encryption algorithm: aes256-ctr
Client MAC algorithm: hmac-sha1
Client compression algorithm: none
Client language:
Server encryption algorithm: aes256-ctr
Server MAC algorithm: hmac-sha1
Server compression algorithm: none
Server language:
]]></description>
    <service>SSH</service>
    <sessionid>48018549</sessionid>
    <type>0</type>    <severity>1</severity>
    <lstnconnaddr>10.10.10.100:22</lstnconnaddr>
    <cliconnaddr>10.10.11.201:60580</cliconnaddr>
    <sguid>04AD6AD5-FB2E-4F03-7993-447648CC3EED</sguid>
  </entry>
</root>
simbabque
  • 53,749
  • 8
  • 73
  • 136
  • Thanks, simbabque. Just a follow-up one - how can I use your code if I have a list of IPs in IPs.txt and to query multiple xml files? – jefrey cayab Jan 13 '16 at 08:39
  • @jefreycayab you have to change some things, but yes. I used `DATA` as an example. You can do `parsefile` and give it file names that could come from the command line, and use a list of IPs in the code, maybe as a lookup hash. I'll leave that as an exercise for you. :) – simbabque Jan 13 '16 at 08:41
0

If I get it right, what you're trying to do is: filter your list of <entry> elements by the value of their <cliconnaddr>. To me, this smells like XSLT!

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:h="http://www.w3.org/1999/xhtml">
    <xsl:output encoding="UTF-8" method="xml" version="1.0" indent="yes"/>
    <!-- Catch-all templates -->
    <xsl:template match="@*|text()">
        <xsl:copy-of select="."/>
    </xsl:template>
    <xsl:template match="*">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="/">
        <xsl:apply-templates select="@*|node()"/>
    </xsl:template>
    <xsl:template match="processing-instruction()">
        <xsl:copy/>
    </xsl:template>
    <!-- specific part -->
    <xsl:template match="entry[cliconnaddr[text()!='10.10.11.201:63529']]"/>
</xsl:stylesheet>

What this XSLT does is: copy everything aside from entries whose <cliconnaddr> does not have a value of "10.10.11.201:63529". As this is an XSLT 1.0, it should be easy to find an XSLT processor that runs in your context.

Kim Homann
  • 3,042
  • 1
  • 17
  • 20
  • Thanks, Kimmy. Just a follow-up one - how can I use your code if I have a list of IPs in IPs.txt and to query multiple xml files? – jefrey cayab Jan 13 '16 at 08:39
  • Instead of writing `entry[cliconnaddr[text()!='10.10.11.201:63529']]`, you can concat several IP addresses like this: `entry[cliconnaddr[text()!='10.10.11.201:63529'][text()!='10.10.11.202:63530']]`. The only problem is how to move the list of IP addresses from your text file to the XSL without having to manually copy/paste them. I cannot tell you what's the best solution for you. – Kim Homann Jan 13 '16 at 11:51