-1

I have a long xml file that contains many times the string

<div type="something">

I need to add only to those strings the text "id="NUMBER" where NUMBER is a value starting from 1 and incrementing by 1. My output should be

<div id="1" type="something">
<div id="2" type="something">
<div id="3" type="something">
...

I would preferably use Perl; can anyone help me?

Thank you, Stefania

I tried this:

use strict;
use warnings;

my $str = "<div id=\"";

my $i = 0;
$str =~ s/<div id=\"/'<div id="'.++$i/ eg;
print "$str";
  • 4
    **Try writing something yourself** and then if it doesn't work, show us specifically what you did so we can help you along. **You start it, and then we help. We don't write it for you.** Show us the actual code that you've tried, and then describe what happened and what's not right, and then we can help you from there. Chances are you'll get pretty close to the answer if you just try it yourself first. – Andy Lester Jun 03 '14 at 15:59
  • @user2044347 Instead of adding your code as a comment, edit your original question to include code to show what you've tried. – Christoffer Hammarström Jun 03 '14 at 16:13
  • Where are you reading the file? – Christoffer Hammarström Jun 03 '14 at 16:19
  • I run the script directly in BBEdit, with my file opened. – user2044347 Jun 03 '14 at 16:27
  • @user2044347 Nowhere in your snippet is there any code that reads files you have open in BBEdit. If you want something to happen, you must write code to make it to happen. – Christoffer Hammarström Jun 03 '14 at 17:36

1 Answers1

5

Regex-fu with XML files is akin to open-heart surgery with a Swiss army knife. Just because you can doesn't mean you should.

Since there are plenty of dedicated XML parsers to choose from, why not use that instead? Here's how one might do it using XML::LibXML:

use strict;
use warnings;
use XML::LibXML;

my $xml = XML::LibXML->new->parse_file( 'file.xml' );

my @wanted_nodes = $xml->findnodes( '//div[@type="something"]' );

my $counter = 1;
for my $div ( @wanted_nodes ) {

    $div->setAttribute( 'id', $counter++ );
}

$xml->toFile( 'new_file.xml' );
Community
  • 1
  • 1
Zaid
  • 36,680
  • 16
  • 86
  • 155
  • "Could not create file parser context for file "file.xml": No such file or directory" – user2044347 Jun 03 '14 at 17:05
  • 1
    @user2044347 : I don't know what your XML file is called. Replace it with the name of your XML file. – Zaid Jun 03 '14 at 17:06
  • Sorry. The problem with xml parsers is that it is a pos-tagged file, with xml annotation. That is, the xml annotation contains a three-column, tab-separated text where each word is followed by its part-of-speech and lemma. So, when I run your code it gives me errors like "parser error : xmlParseEntityRef: no name & NOCAT &", because for me & is a word and not an entity. Another error is "Input is not proper UTF-8, indicate encoding !" (it's iso-latin1). – user2044347 Jun 03 '14 at 18:10
  • Ok, i think I can manage the & errors. How do I catch any "div type" attribute value (I have different ones)? If I use my @wanted_nodes = $xml->findnodes( '//div[@type=".*"]' ); it doesn't replace anything. – user2044347 Jun 03 '14 at 18:43
  • @user2044347 : `'//div[@type]'` should do it – Zaid Jun 04 '14 at 08:48