1

I have a xml file for example,

<title> hello <name> hi </name> <street> id </street> this is xml file </title>

Here the parent node is title. I am going to extract the text inside the parent node removing the inner tags.

I have tried with the regex. But Is there any way other than using regex like, using some xml based functions to remove the tags. Note: the tag name is not known beforehand.

Hi I have tried this, I used the same xml

use XML::Simple; 
use Data::Dumper; 

my $simple = XML::Simple->new(); 
my $data = $simple->XMLin('XMLRemoval.xml'); 
my %oldHash = %$data; my %newHash = (); 

while ( my ($key, $innerRef) = each %oldHash ) 
{ 
    $newHash{$key} = @$innerRef[1]; 
} 

foreach $key ( keys %newHash ) 
{ 
    print $newHash{$key}; 
}

And I am getting the error : Can't use string (" id ") as an ARRAY ref while "strict refs"

Sishanth
  • 57
  • 1
  • 1
  • 7
  • have you checked the module XML::Simple??, i think you will get what u want by using it. – kailash19 Oct 26 '12 at 11:41
  • I have searched but I can't find anything to remove using XML::Simple. Can you please tell if you know method to do – Sishanth Oct 26 '12 at 12:10
  • If you have knowledge about hashes and by using XMLin method you can get what you want. Again, please try from your end, paste your code(error, effort) so that we can help. Asking for code is not good as ppl here are for help and not to do your tasks. It might also result in down votes. – kailash19 Oct 26 '12 at 12:14
  • Hi I have tried this, I used the same xml use XML::Simple; use Data::Dumper; my $key; my $simple = XML::Simple->new(); my $data = $simple->XMLin('XMLRemoval.xml'); my %oldHash = %$data; my %newHash = (); while ( my ($key, $innerRef) = each %oldHash ) { $newHash{$key} = @$innerRef[1]; } foreach $key ( keys %newHash ) { print $newHash{$key}; } And I am getting the error : Can't use string (" id ") as an ARRAY ref while "strict refs" – Sishanth Oct 29 '12 at 05:14

4 Answers4

2
use strict;
use warnings;

use features qw/say/;
use Mojo::DOM;

my $dom = Mojo::DOM->new('<title> hello <name> hi </name> <street> id </street> this is xml file </title>');

say $dom->all_text;
# hello hi id this is xml file

say $dom->at('title')->all_text;
# hello

You get the idea

jshy
  • 111
  • 6
1

The most brutal way is:

use strict;
use warnings;

use feature 'say';


my $text = '<title> hello <name> hi </name> <street> id </street> this is xml file </title>' ;

$text =~ s|<.+?>||g;
say "Text |$text|";

But, as you probably know, is not ok to parse html with regex.

Community
  • 1
  • 1
Tudor Constantin
  • 26,330
  • 7
  • 49
  • 72
1

Based on your requirement, you can try this. I have used the file provided by you in example.

we are here defining the root key contents in XML(or renaming it), you need to choose a key which will not be in your XML( i have choose root-contents).

#!/usr/bin/perl
use strict;
use XML::Simple;
use Data::Dumper;
my $key;
my $simple = XML::Simple->new();
my $data = $simple->XMLin('XMLRemoval.xml', 'ContentKey' => 'root-contents');
print Dumper $data;
my $val = $data->{'root-contents'};
if(ref($val) =~ /Array/i)
{
    foreach (@$val)
    {
        print "$_\n";
    }
}
else
{
    print "$val\n";
}

Please go though the XML::Simple documentation, there are lot of options to tweak per your requirement.

I will leave the debug part to you for your code to check what was the error and how can solve it(which is explanatory itself) :).

kailash19
  • 1,771
  • 3
  • 22
  • 39
  • Hi , the code works fine. I have another doubt. my $path = "D:\DocRepos\Tasks\PerlPrograms\XMLRemoval.xml"; my $data = $simple->XMLin($path, 'ContentKey' => 'root-contents'); when I try to assign the path to a variable am getting error. File does not exist: D:DocReposTasksPerlProgramsXMLRemoval.xml . Please help me. – Sishanth Oct 29 '12 at 13:34
  • In your code your are having '\', you need to escape it using another '\'. Or else you can have your path enclosed in single quotes. Please go though Perl documentation about it. This error is pretty simple and you should have been able to resolve it. – kailash19 Oct 30 '12 at 04:42
  • Am not asking the questions without trying. I have tried the above cases but still not working . – Sishanth Oct 30 '12 at 05:25
  • are you sure its nor working?? what error you got?? Have you used it like: `$path="D:\\DocRepos\\Tasks\\PerlPrograms\\XMLRemoval.xml"` ?? – kailash19 Oct 30 '12 at 05:29
0

You can use XML::XSH2:

open file.xml ;
echo (/title) ;       # hello  hi   id  this is xml file
echo /title/text() ;  # hello     this is xml file 
choroba
  • 231,213
  • 25
  • 204
  • 289
  • Am new to perl I don't know how to use XML::XSH2.When I tried your code it said that the it cant locate the XML::XSH2. And I am writing script in notepad++ – Sishanth Oct 26 '12 at 11:28
  • 3
    [FAQ: What's the easiest way to install a missing perl module?](http://stackoverflow.com/questions/65865/whats-the-easiest-way-to-install-a-missing-perl-module) – memowe Oct 26 '12 at 11:31