-3

I have a XML file using which I am grepping some of the value based on some regex. The XML file looks like this-

<Instance>Fuse_Name</Instance>
<Id>8'hed</ID>
<SomeAddr>17'h00baf</SomeAddr>
<PSomeAddr>17'h00baf</PSomeAddr>

I want to retrieve 17'h00baf value from "SomeAddr" tag. I am matching the regex "SomeAddr" so as to reach that row in the file and then using index and substr function I am retrieving value using below code

my $i = index($row,">");
my $j = index($row,"<"); 
$Size_in_bits = substr $row,$i+1,$j-$i-3;

But after doing this I am not getting 17'h00baf . Instead I am getting 17'h01191 . On similar approach I am able to grep other values which are decimal or string,Only with the hexadecimal values I am facing this problem. Can somebody please tell me what is wrong in the approach??

Zobia Kanwal
  • 4,085
  • 4
  • 15
  • 38
rikki
  • 431
  • 1
  • 8
  • 18
  • 1
    I don't see how you could possibly get `17'h01191` from the information you gave us. Please, post a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve). – Dada Mar 28 '19 at 09:58
  • 2
    Regardless, you probably [shouldn't use regex to parse XML](https://stackoverflow.com/a/1732454/4990392), and instead use a [XML parser](https://stackoverflow.com/questions/487213/whats-the-best-xml-parser-for-perl). – Dada Mar 28 '19 at 10:00
  • 2
    @rikki: You might also consider accepting a few more [answers to your previous questions](https://stackoverflow.com/users/10004688/rikki?tab=questions) if you want people to help you here :-) – Dave Cross Mar 28 '19 at 10:25
  • Some call it [summoning the daemon](https://www.metafilter.com/86689/), others refer to it as [the Call for Cthulhu](https://blog.codinghorror.com/parsing-html-the-cthulhu-way/) and few just [turned mad and met the Pony](https://stackoverflow.com/a/1732454/8344060). In short, never parse XML or HTML with a regex! Did you try an XML parser such as `xmlstarlet`, `xmllint` or `xsltproc`? – kvantour Apr 04 '19 at 09:01

2 Answers2

4

Please don't parse XML with regexes. Use a proper XML parser.

But, ignoring that advice temporarily, I don't get the behaviour you describe when testing your code.

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

while (<DATA>) {
  next unless /<SomeAddr>/;

  my $i = index($_, ">");
  my $j = index($_, "<");
  my $Size_in_bits = substr $_, $i + 1, $j - $i - 3;
  say $Size_in_bits;
}

__END__
<Instance>Fuse_Name</Instance>
<Id>8'hed</ID>
<SomeAddr>17'h00baf</SomeAddr>
<PSomeAddr>17'h00baf</PSomeAddr>

And running it:

$ perl parsexml
17'h00baf

Of course, I've had to guess at what a lot of your code looks like because you didn't give us a complete example to test. So it looks likely that your problems are in bits of the code that you haven't shown us.

(My guess would be that there's another <SomeAddr> tag in the file somewhere.)

Dave Cross
  • 68,119
  • 3
  • 51
  • 97
2

Never, ever use a regex to parse HTML/XML/.... Always use a proper parser and then implement your algorithm in the DOM domain.

My solution shows how to parse the XML and then extract the text content from <SomeAddr> nodes at the top-level of the XML document.

#!/usr/bin/perl
use warnings;
use strict;

use XML::LibXML;

my $doc = XML::LibXML->load_xml(IO => \*DATA);
my $xpc = XML::LibXML::XPathContext->new();

# register default NS
$xpc->registerNs('default', 'http://some.domain.com/some/path/to');

foreach my $node ($xpc->findnodes('//default:SomeAddr', $doc)) {
    print $node->textContent, "\n";
}

exit 0;

__DATA__
<Root xmlns="http://some.domain.com/some/path/to">
  <Instance>Fuse_Name</Instance>
  <Id>8'hed</Id>
  <SomeAddr>17'h00baf</SomeAddr>
  <PSomeAddr>17'h00baf</PSomeAddr>
</Root>

Test run

$ perl dummy.pl
17'h00baf
Stefan Becker
  • 5,695
  • 9
  • 20
  • 30
  • I am getting that and it is only happening with that row which is containing hexadecimal no starting with 17 i.e (17'h00baf) . Other hexadecimal no like 8'hed, 16'h037a I am getting no problem-@Stefan Becker – rikki Mar 28 '19 at 11:04
  • @rikki: Telling us that doesn't really help. We can't do any more unless you post a complete, runnable, example that demonstrates the problem. – Dave Cross Mar 28 '19 at 14:15