I have xml files as input. In those xml files,there are tag such as:
First Instance:
<xref ref-type="bibr" rid="perl-ch006-bib080"><sup>80</sup></xref><sup>–</sup><xref ref-type="bibr" rid="perl-ch006-bib082"><sup>82</sup></xref>
Second Instance:
<xref ref-type="bibr" rid="perl-ch001-bib009"><sup>9</sup></xref><sup>–</sup><xref ref-type="bibr" rid="perl-ch001-bib012"><sup>12</sup></xref><sup>,</sup><xref ref-type="bibr" rid="perl-ch001-bib057"><sup>57</sup></xref><sup>–</sup><xref ref-type="bibr" rid="perl-ch001-bib059"><sup>59</sup></xref>
in the above two instances there are numbers 80 and 82, where 81 is missing,9-12,57-59 and – is the entity for -(hypen). I need to copy the entire data of the xml file and add the missing range in that particular position.
Output should be as follow: For First Instance:(i.e. in the follwing pattern 80 81-82)
<xref ref-type="bibr" rid="perl-ch006-bib080"><sup>80</sup></xref><xref ref-type="bibr" rid="perl-ch006-bib081"><sup>81</sup></xref><sup>–</sup><xref ref-type="bibr" rid="perl-ch006-bib082"><sup>82</sup></xref>
For Second Instance: (i.e. in the follwing pattern 9 10 11-12, 57 58-59)
<xref ref-type="bibr" rid="perl-ch001-bib009"><sup>9</sup></xref><xref ref-type="bibr" rid="perl-ch001-bib010"><sup>10</sup></xref><xref ref-type="bibr" rid="perl-ch001-bib011"><sup>11</sup></xref><sup>–</sup><xref ref-type="bibr" rid="perl-ch001-bib012"><sup>12</sup></xref><sup>,</sup><xref ref-type="bibr" rid="perl-ch001-bib057"><sup>57</sup></xref><xref ref-type="bibr" rid="perl-ch001-bib058"><sup>58</sup></xref><sup>–</sup><xref ref-type="bibr" rid="perl-ch001-bib059"><sup>59</sup></xref>
All the changes are to be done in the output files, so that input files are not hampered.
Code:
#!/usr/bin/perl
use strict;
use Cwd;
use File::Basename;
use File::Copy;
my $path1=getcwd;
opendir(INP, "$path1\/Input");
my @out = grep(/.(xml)$/,readdir(INP));
close INP;
foreach my $final(@out)
{
my $filetobecopied = "Input\/".$final;
my $newfile = $final;
copy($filetobecopied, $newfile) or die "File cannot be copied.";
}
opendir DIR, $path1 or die "cant open dir";
my @files = grep /(.*?)\.(xml)$/,(readdir DIR);
closedir DIR;
open(F6, ">Ref.txt");
print F6 "FileName\tMatchedString\tOutput\n";
foreach my $f(@files)
{
open(F1, "<$f") or die "Cannot open file: $files[0]";
my $data=join("", <F1>);
close F1;
my $xml_list=$data;
#print F6 $xml_list."\n";
$xml_list=~s/–/-/gs;
$xml_list=~s/–/-/gs;
while($xml_list=~m/(<xref ref-type="(bibr|bib)" rid="(.*?)-ch(\d+)-(bibr|bib)(\d+)">(<sup>)?(\d+)(<\/sup>)?<\/xref><sup>(-)+<\/sup>)(<xref ref-type="(bibr|bib)" rid="(.*?)-ch(\d+)-bib(\d+)">(<sup>)?(\d+)(<\/sup>)?<\/xref>)/igs)
{
my $i;
my $xref=$1;my $bibr=$2;
my $rid=$3; my $ch=$4;my $bib=$6;my $hyp=$10;
my $num=$8;
my $xref1=$11;
my $num1=$17;
if($hyp=~m/(-)/gs)
{
my $counter=$num;
while($counter<=$num1) #for($counter=$num;$counter<=$num1;$counter++)
{
#print F6 "<xref ref-type=\"$bibr\" rid=\"$rid\-ch$ch\-$bibr$counter\"><sup>$counter<\/sup><\/xref>,"."\n";
$counter++;
}
}
}
$xml_list=~s/&orb;/\(/g;
$xml_list=~s/&crb;/\)/g;
$xml_list=~s/-/–/gs;
$xml_list=~s/-/–/gs;
open(OUT, ">$path1\/Output\/$f");
print OUT $xml_list;
close OUT
}
foreach my $del(@files)
{
unlink $del
}
Any help would be appreciated..