2

I am fairly new to Perl. I want to update a specific node value LocationID in this XML file that matches values that I read in from a text file.

Sample XML File

<?xml version="1.0" encoding="UTF-8"?>
<TestImportFile xmlns="urn:TestImportFile-schema">
    <LOCATION SOURCEID="Yes">
        <LOCATIONID>F16-000100</LOCATIONID>
        <LOCATIONCATEGORY>UFO ABDUCTEE</LOCATIONCATEGORY>
        <LOCAL BIT="Test Case File">
            <LOCALNAME>DTG2QP</LOCALNAME>
            <ASSIGNEDTO>BearmanJ</ASSIGNEDTO>
            <ASSIGNEDTODATETIME>2016-02-02T07:59:00</ASSIGNEDTODATETIME>
            <CASE>
                <CASEVALUE>21</CASEVALUE>
            </CASE>
            <CASE>
                <CASEVALUE>35</CASEVALUE>
            </CASE>
        </LOCAL>
        <LOCAL BIT="Test Case File">
            <LOCALNAME>F4T2557</LOCALNAME>
            <READINGBY>BearmanJ</READINGBY>
            <READINGDATETIME>2016-04-03T06:48:00</READINGDATETIME>
            <CASE>
                <CASEVALUE>83</CASEVALUE>
            </CASE>
            <CASE>
                <CASEVALUE>40</CASEVALUE>
            </CASE>
        </LOCAL>
    </LOCATION>
    <LOCATION SOURCEID="Yes">
        <LOCATIONID>F16-000101</LOCATIONID>
        <LOCATIONCATEGORY>UFO ABDUCTEE</LOCATIONCATEGORY>
        <LOCAL BIT="Test Case File">
            <LOCALNAME>ZGV4TF</LOCALNAME>
            <ASSIGNEDTO>BearmanJ</ASSIGNEDTO>
            <ASSIGNEDTODATETIME>2016-02-02T07:59:00</ASSIGNEDTODATETIME>
            <CASE>
                <CASEVALUE>34</CASEVALUE>
            </CASE>
            <CASE>
                <CASEVALUE>67</CASEVALUE>
            </CASE>
        </LOCAL>
        <LOCAL BIT="Test Case File">
            <LOCALNAME>E5Y7456</LOCALNAME>
            <READINGBY>BearmanJ</READINGBY>
            <READINGDATETIME>2016-04-03T06:48:00</READINGDATETIME>
            <CASE>
                <CASEVALUE>53</CASEVALUE>
            </CASE>
            <CASE>
                <CASEVALUE>20</CASEVALUE>
            </CASE>
        </LOCAL>
    </LOCATION>
    <LOCATION SOURCEID="Yes">
        <LOCATIONID>F16-000102</LOCATIONID>
        <LOCATIONCATEGORY>UFO ABDUCTEE</LOCATIONCATEGORY>
        <LOCAL BIT="Test Case File">
            <LOCALNAME>ZGV4TF</LOCALNAME>
            <ASSIGNEDTO>BearmanJ</ASSIGNEDTO>
            <ASSIGNEDTODATETIME>2016-02-02T07:59:00</ASSIGNEDTODATETIME>
            <CASE>
                <CASEVALUE>34</CASEVALUE>
            </CASE>
            <CASE>
                <CASEVALUE>67</CASEVALUE>
            </CASE>
        </LOCAL>
        <LOCAL BIT="Test Case File">
            <LOCALNAME>E5Y7456</LOCALNAME>
            <READINGBY>BearmanJ</READINGBY>
            <READINGDATETIME>2016-04-03T06:48:00</READINGDATETIME>
            <CASE>
                <CASEVALUE>53</CASEVALUE>
            </CASE>
            <CASE>
                <CASEVALUE>20</CASEVALUE>
            </CASE>
        </LOCAL>
    </LOCATION>
</TestImportFile>

Sample Text File

  F16-000100:2B-16-NOR-0005-J3
  F16-000101:2B-16-NOR-0005-J4
  F16-000102:2B-16-NOR-0005-J5

I can read the test file into an array but I cannot determine how to search the XML file for the match then update that value in the XML file with the value desired.

My script to read in text file:

my $filename = '1TestData.txt';
open(FILE, $filename) or die "Could not read from $filename, program    halting.";
my $output = '1TestOutput.txt';
open(OUTPUT, '>'.$output) or die "Can't create $output.\n";
while(<FILE>){
    chomp;
    @fields = split(':', $_);
    print "$fields[0]\n";
}
close FILE;

I want to update the LOCATIONID value to the second value of the matching value found in the text file.

<LOCATIONID>F16-000100</LOCATIONID>

Desired Result:

<LOCATIONID>2B-16-NOR-0005-J3</LOCATIONID>

without touching anything else in the XML file.

simbabque
  • 53,749
  • 8
  • 73
  • 136
BCope
  • 21
  • 4
  • Are the LOCATIONIDs unique, or are they duplicated? – Sobrique Mar 08 '17 at 14:50
  • the locationids are unique, but there could be 1000 to 3000 of them in this xml file. Same for the text file. There should be a one-to-one match. – BCope Mar 08 '17 at 16:00
  • OK. The example I've given should work, but if your XML is particularly large, you may face memory issues. If you do, then there's another approach we can use (incremental parsing) but I wouldn't suggest that unless it's necessary. – Sobrique Mar 08 '17 at 16:29
  • Excellent. Thank you so much. I will try it on this file with 1000 locations; and if that works I can limit my files to that size. This script will only be run once a week or so. – BCope Mar 08 '17 at 16:37
  • I think that the memory issue arose, as the screen output did not complete. I do not really need to see the result except for the final xml file. I can verify it there. What would the incremental parsing approach look like? Why would you recommend against it. – BCope Mar 08 '17 at 16:50
  • I think that's probably best with a separate question. It's not a problem exactly, but it's a bit counter intuitive. If your resultant XML looks ok though, then that worked fine. – Sobrique Mar 08 '17 at 19:13
  • After running it and waiting about 4 minutes for it to complete; everything looks good. I can live with a 5 minute processing time. Thanks again for your help. Greatly greatly appreciated. – BCope Mar 09 '17 at 19:45

1 Answers1

2

Please - don't use regular expressions. XML is contextual, and regular expressions ... aren't.

So with that in mind - use a parser. I like XML::Twig (XML::LibXML is also pretty good. XML::Simple is discouraged)

But you do have xpath available, which is similar, but better suited to it.

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;


#parse your file.
my $xml = XML::Twig -> new -> parsefile('sample1.xml');

#open the replacements file for reading
open ( my $input, '<', 'file2.txt') or die $!;
#turn it into key-values for replacement
#probably a bit overkill, as you can just do this iteratively instead. 
my %replace = map { s/\s+//g; split /:/ } <$input>;
close ( $input );

#print for debug
print "Using for replacement:\n ";
print Dumper \%replace;

#iterate all of the search terms
foreach my $search ( keys %replace ) { 
   #use XPATH to find location ID that matches.
   #note - this only finds the _first_ location ID. To do 'all' you'd 
   #need to loop. 
   $xml -> get_xpath("//LOCATIONID[string()=\"$search\"]",0) -> set_text($replace{$search});
}

#set output formatting
$xml -> set_pretty_print('indented_a');
#print to screen
$xml -> print;

#for output:
open ( my $output, '>', 'transformed.xml' ) or die $!;
print {$output} $xml -> sprint;
close ( $output );

If there's multiple instances of particular location IDs, you'd need to:

$_ -> set_text($replace{$search}) for $xml -> get_xpath("//LOCATIONID[string()=\"$search\"]");

instead, as this will search for all nodes matching that particular ID and replace all of them.

Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • There are definitely multiple location ids. The xml file will contain about 1000 to 3000 location ids. – BCope Mar 08 '17 at 15:59
  • It works! Sobrique THANK YOU that works like a champ! Thank you you so much. – BCope Mar 08 '17 at 16:14
  • Sorry, multiple as in 'not unique' - multiple matching a particular key. – Sobrique Mar 08 '17 at 16:30
  • Multiple as in matching a particular key. Your solution appears to be working and doing exactly what I want. I am about to test on a txt file with 1000 records; using the loop. – BCope Mar 08 '17 at 16:32