0

I have a small data set in an XML format:

 <symbolgroupdef id="bin_11-QQQQ"> 
      <symbol>QQQ</symbol> 
    </symbolgroupdef>
    <symbolgroupdef id="bin_6-AAPL">
      <symbol>AAPL</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7-BIDU">
      <symbol>BIDU</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7">
      <symbol>AAPL</symbol>
      <symbol>IBM</symbol>
    </symbolgroupdef>

I want to print out the symbolgroupdef and the symbol where a certain symbol exists. The symbol may appear under several symbolgroupdef groups.

Here is the code I have written so far:

#!/usr/bin/perl
use warnings; 
use strict ;
$symbol = $ARGV[0] ;  
my $sym_file = "/data/xmlconfig/config.xml";
open my $sym_fh, '<', $sym_file or die $!;
while($line = <$sym_fh>) {
    if (my $line =~ /\<symbolgroupdef id=\".*\"\>/) {
        print $line ;
        sleep 1;
        }
    }

Basically what I want is something with will find the symbolsgroupdef id line, look for the specified symbol under it, and if it finds it, print the symbolgroupdef is line and the symbol under it. The symbol will be a command line entry and specified by $ARGV[0]

in the above case theses two lines should be printed

<symbolgroupdef id="bin_6-AAPL">
<symbol>AAPL</symbol>
<symbolgroupdef id="bin_7">
<symbol>AAPL</symbol>

I don't have any modules on this machine, and can't install any on this machine. Please forgive me for parsing XML without a module.

i alarmed alien
  • 9,412
  • 3
  • 27
  • 40
capser
  • 2,442
  • 5
  • 42
  • 74
  • Instead of looking for the symbolgroupdef line, why not search for the symbol while keeping track of what the current symbolgroupdef is? Alternatively, how about using a multiline regex that will search for `` chunks that contain `TARGET`? – i alarmed alien Oct 08 '14 at 00:18
  • that is a really good idea thank you – capser Oct 08 '14 at 03:11

2 Answers2

1

Here's a solution based on the idea of keeping a record of the most recent <symbolgroupdef> attribute. It stores the id in $sgline, although you can store the whole line if you want. When a line turns up with the correct value in the symbol element, you can print out $sgline.

#!/usr/bin/perl
use warnings; 
use strict;

my $id = $ARGV[0];

# uncomment these to use your file
#my $sym_file = "/data/xmlconfig/config.xml";
#open my $sym_fh, '<', $sym_file or die $!;

my $sgline = '';

# change DATA to $sym_fh to use your file
while (<DATA>) {
    # match the symbolgroupdef element
    if (m#<symbolgroupdef id="(.+?)">#) {
        $sgline = $1; # or store the whole line using $sgline = $_;
    }
    # match the symbol element with the appropriate ID
    elsif (m#<symbol>$id</symbol>#) {
        print "$sgline\n";
    }
}


__DATA__
    <symbolgroupdef id="bin_11-QQQQ"> 
      <symbol>QQQ</symbol> 
    </symbolgroupdef>
    <symbolgroupdef id="bin_6-AAPL">
      <symbol>AAPL</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7-BIDU">
      <symbol>BIDU</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7">
      <symbol>AAPL</symbol>
      <symbol>IBM</symbol>
    </symbolgroupdef>

Output:

bin_6-AAPL
bin_7
i alarmed alien
  • 9,412
  • 3
  • 27
  • 40
0

Don't use a regex to parse XML. Instead use an actual XML Parser.

I'd recommend using XML::LibXML:

use strict;
use warnings;

use XML::LibXML;

my $xml = XML::LibXML->load_xml(IO => \*DATA);

for my $group ($xml->findnodes(q{//symbolgroupdef/symbol[text()='BIDU']/..})) {
    print $group->getAttribute('id'), "\n";
}

__DATA__
<root>
    <symbolgroupdef id="bin_11-QQQQ"> 
        <symbol>QQQ</symbol> 
    </symbolgroupdef>
    <symbolgroupdef id="bin_6-AAPL">
        <symbol>AAPL</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7-BIDU">
        <symbol>BIDU</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7">
        <symbol>AAPL</symbol>
        <symbol>IBM</symbol>
    </symbolgroupdef>
</root>

Outputs:

bin_7-BIDU
Miller
  • 34,962
  • 4
  • 39
  • 60
  • I did not notice that. However, I'd still point out [How can I install Perl modules without root privileges?](http://stackoverflow.com/q/3735836/1733163) before recommending any regex solution, but yours might work for him. – Miller Oct 08 '14 at 18:06