Perl replace strings in XML file if unequal

Question

I have an XML file with values like the following:

<values val1="4124" val2="21341"></values>
<values val1="1234" val2="231"></values>
<values val1="814" val2="8943"></values>

I would like to make val2 assume the value of val1 in all cases where they're unequal.

So the above would become:

<values val1="4124" val2="4124"></values>
<values val1="1234" val2="1234"></values>
<values val1="814" val2="814"></values>

Here is what I have:

perl -pi -e 's,val2=\"[0-9]*\,val1=\"[0-9]*\,g;' *

I am mainly having trouble understanding how to substitute the value of val1 to val2. My above code will do this:

<values val1="4124" val1="4124"></values>

This problem becomes pretty trivial if you use an XML parser. — Matt Jacob, Jan 13 '16 at 16:23
@MattJacob I'm not too familiar with string replacements in perl let alone XML parsers in perl tbqh. I want to try to get down the basic pattern matching constructs if possible. — nietsnegttiw, Jan 13 '16 at 16:25
Perhaps these [search results](http://stackoverflow.com/search?q=%5Bperl%5D+libxml+or+twig) will be helpful for you. — Matt Jacob, Jan 13 '16 at 16:34
[On parsing XML with regex](http://stackoverflow.com/a/1732454/2566198) — Sobrique, Jan 13 '16 at 16:36
Re " I'm not too familiar with string replacements in perl let alone XML parsers in perl tbqh.", If you don't know either, all the more reason to use the right approach!! — ikegami, Jan 13 '16 at 16:43
As a warning, the insults directed at other users must stop right now. I have removed all of the offending comments, so let's keep this polite and on topic. — Brad Larson, Jan 13 '16 at 17:54

ikegami · Accepted Answer · 2016-01-13T17:27:05.790

3

perl -MXML::LibXML -e'
   my $doc = XML::LibXML->new()->parse_file($ARGV[0]);
   for my $node ($doc->findnodes("//@values")) {
      $node->setAttribute("val2", $node->getAttribute("val1"));
   }
   print($doc->toString());
' infile.xml >outfile.xml

$parser->parse_file parses the file.
$doc->findnodes finds nodes in the document.
$node->getAttribute gets a node's attribute
$node->setAttribute sets a node's attribute

edited Jan 13 '16 at 17:27

answered Jan 13 '16 at 16:46

ikegami

367,544
15
269
518

Hmm. Code-only, one-line non-portable solution. What's not to like? – Borodin Jan 13 '16 at 17:16
1

@Borodin, What are you talking about? It is documented. The OP wants a one-liner. It is portable. At least we agree it is a solution. /// I added some extra documentation for people like you. – ikegami Jan 13 '16 at 17:22
@ikegami It might be a stretch to say the OP _wanted_ a one-liner. But since he presented that as his attempt, I don't think there's anything wrong with answering in kind. – Matt Jacob Jan 13 '16 at 17:52

Adam Taylor · Answer 2 · 2016-01-13T17:44:45.763

0

Note: what follows definitely constitutes a hacky one-liner. If you want to do this properly, follow the advice and other answer and use a proper XML parser.

While it would be cleaner to use an XML parser, if your input is literally as you have it, you might be able to get away with using regular expression matching groups.

From the documentation http://perldoc.perl.org/perlrequick.html#Extracting-matches

The grouping metacharacters () also allow the extraction of the parts of a string that matched. For each grouping, the part that matched inside goes into the special variables $1 , $2 , etc.

It's worth reading the whole page. Then you can try:

perl -pi -e 's/<values val1="([0-9]+)" val2="[0-9]+"><\/values>/<values val1="$1" val2="$1"><\/values>/g;'

edited Jan 13 '16 at 17:44

answered Jan 13 '16 at 16:39

Adam Taylor

7,534
8
44
54

1

I'm always ambivalent when someone posts an answer like this. Because on one hand, it's technically correct. But on the other, it's a bodge, and as such not as useful to future readers as an example of how to do it right. – Sobrique Jan 13 '16 at 16:43
I get what you (and everyone else) are saying but it's undeniable that perl/sed/whatever are useful for quick hacks like this. I will add a note to my answer because it's a fair point about future people finding this question. – Adam Taylor Jan 13 '16 at 17:42
Agreed. And I would by lying if I said I had never done it. But at the same time, it does lead to brittle solutions - one day it might just not cope with semantically identical XML and just break for no apparent reason. I have been burned by similar things that have made their way into production. – Sobrique Jan 14 '16 at 07:46

Perl replace strings in XML file if unequal

2 Answers2