-3

I have around 150 xml files placed in a folder that needs to be updated with a new tag.

Current:

<entry key="mergeTemplates" value="false"/>
<entry key="sysDescriptions"/>

New:

  <entry key="mergeTemplates" value="false"/>
  <entry key="requestable">
    <value>
      <Boolean>true</Boolean>
    </value>
  </entry>
  <entry key="sysDescriptions">

I did try java's "replace" method. But wasnt able to accomplish it. Tried the "sed" command on Unix as well.

Any suggestions on the best way or tool to accomplish this?

4 Answers4

1

In general, you should not attempt to process XML data with line-oriented tools. Use something like xmlstarlet instead:

xmlstarlet ed -i "//entry[@key='sysDescriptions']" -t elem -n "new_entry" \
    -i "//new_entry" -t attr -n "key" -v "requestable" \
    --subnode "//new_entry" -t elem -n "value" \
    --subnode "//new_entry/value" -t elem -n "Boolean" \
    --subnode "//new_entry/value/Boolean" -t text -n "dummy" -v "true" \
    -r "//new_entry" -v "entry" input.xml

For the sake of readability, I inserted a new element called new_entry, and finally renamed it. Make sure that no such element exists in your input file.

Michael Vehrs
  • 3,293
  • 11
  • 10
  • 1
    If one just needs to process a bunch of specific files with well known formatting, there is actually no reason to avoid fast and simple plain text processing. XML file content is a subset of generic text, after all. – Fedor Losev Dec 07 '16 at 12:27
  • I don't agree. `XML` is contextual, and regex isn't. Thus a regex solution will _always_ be brittle and hacky, because `XML` can change format in a bunch of perfectly valid ways that'll break regex messily. – Sobrique Dec 07 '16 at 12:33
  • I completely agree, if you develop a library or a production system. But if you need to merely update your specific files, with specific data, not always there is a need to over-complicate it and design all bells an whistles. In this case there is no regex, just find and replace text line. – Fedor Losev Dec 07 '16 at 12:39
  • I expanded my answer to explain why I think plain text processing is good enough here. – Fedor Losev Dec 07 '16 at 12:56
1

You've tagged it perl, so I'll offer a perl solution. The best advice I can offer generally is to use a parser because XML is a parsable language, and good ones exist. I particularly like XML::Twig for this sort of job (XML::LibXML is pretty good too, but doesn't do inplace editing).

I strongly urge avoiding regular expressions - XML is not well suited to parsing via regex, because it's contextual and regex isn't.

here's a bunch of perfectly valid changes to XML you can make, like unary tags, indenting and line splitting that leave it semantically identical, but break regex messily. Thus a future change that someone makes - that as far as they're concerned is valid/trivial like reformatting the XML - will break 'downstream' because your script doesn't handle it properly. Furthermore - xpath is a lot like regex, but is contextual and thus well suited to XML parsing/processing.

#!/usr/bin/env perl
use warnings;
use strict;

use XML::Twig;

my $twig = XML::Twig -> parse (\*DATA); 

my $to_insert = XML::Twig::Elt -> new (   'entry', {key => "requestable"} );
$to_insert -> insert_new_elt ( 'value' ) -> insert_new_elt('Boolean', "true" );

print "Generated new XML:\n";
$to_insert -> print;

my $insert_this = $to_insert -> cut;

my $insert_after = $twig -> findnodes ('//entry[@key="mergeTemplates"]',0);
$to_insert -> paste ( after => $insert_after );

print "Generated XML:\n";
$twig -> set_pretty_print('indented'); 
$twig -> print;


__DATA__
<xml>
<entry key="mergeTemplates" value="false"/>
<entry key="sysDescriptions"/>
</xml>

This can be adapted to using XML::Twig's parsefile_inplace method quite handily:

#!/usr/bin/env perl
use warnings;
use strict;
use XML::Twig;

sub insert_merge {
   my ( $twig, $insert_after ) = @_;

   my $to_insert = XML::Twig::Elt->new( 'entry', { key => "requestable" } );
   $to_insert->insert_new_elt('value')->insert_new_elt( 'Boolean', "true" );

   $to_insert->paste( after => $insert_after );
   $twig -> flush;
}

my $twig =
  XML::Twig->new(
   twig_handlers => { '//entry[@key="mergeTemplates"]' => \&insert_merge },
   pretty_print => 'indented' );

 #glob finds files, if you want something more extensive then File::Find::Rule
foreach my $filename ( glob ( "/path/to/dir/*xml" ) ) { 
    $twig->parsefile_inplace($filename); 
}
Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101
0

This is by no means an efficient solution, but it should work just fine for 150 files. If you have SSD it should complete in a blink of an eye.

It assumes you have tags on separate lines and new tag should be inserted after every entry key="mergeTemplates" (if it is not, depending on the case, the code can be slightly modified to use Matcher with chunked read instead of lines or read by two lines to detect second tag).

public void addTextAfterLine(String inputFolder, String prefixLine,
        String text) throws IOException {
    // iterate over files in input dir
    try (DirectoryStream<Path> dirStream = Files
            .newDirectoryStream(new File(inputFolder).toPath())) {
        for (Path inputPath : dirStream) {
            File inputFile = inputPath.toFile();
            String inputFileName = inputFile.getName();
            if (!inputFileName.endsWith(".xml") || inputFile.isDirectory())
                continue;
            File outputTmpFile = new File(inputFolder, inputFile.getName()
                    + ".tmp");
            // read line by line and write to output
            try (BufferedReader inputReader = new BufferedReader(
                    new InputStreamReader(new FileInputStream(inputFile),
                            StandardCharsets.UTF_8));
                    BufferedWriter outputWriter = new BufferedWriter(
                            new OutputStreamWriter(new FileOutputStream(
                                    outputTmpFile), StandardCharsets.UTF_8))) {
                String line = inputReader.readLine();
                while (line != null) {
                    outputWriter.write(line);
                    outputWriter.write('\n');
                    if (line.equals(prefixLine)) {
                        // add text after prefix line
                        outputWriter.write(text);
                    }
                    line = inputReader.readLine();
                }
            }
            // delete original file and rename modified to original name
            Files.delete(inputPath);
            outputTmpFile.renameTo(inputFile);
        }
    }
}

public static void main(String[] args) throws IOException {
    final String inputFolder = "/tmp/xml/input";
    final String prefixLine = "<entry key=\"mergeTemplates\" value=\"false\"/>";
    final String newText = 
            "<entry key=\"requestable\">\n"
                    + "    <value>\n"
                    + "      <Boolean>true</Boolean>\n"
                    + "    </value>\n"
                    + "</entry>\n"              
            ;
    new TagInsertSample()
            .addTextAfterLine(inputFolder, prefixLine, newText);
}

You can also use an advanced editor (e.g. Notepad++ on Windows), with find and replace in files command. Just replace the line <entry key="mergeTemplates" value="false"/> with <entry key="mergeTemplates" value="false"/>\n..new entry.

There are many notes here that you should not process XML with text processing tool. This is true if you are developing a generic system or library, to process unknown files. However, just to achieve a task on your files with known format, there is no need for XML complications and text processing fits just fine.

Preempting comments with the question "how do you know it is not going to be a generic system", I'm pretty confident that while developing a generic production system nobody will ask for "java, perl, Unix sed or any other tool".

Fedor Losev
  • 3,244
  • 15
  • 13
0

With sed these things are relatively easy:

You can match an address with a regex:

/^<entry key="mergeTemplates" value="false"\/>$/

See how there is a few characters that needs to be escaped as they would have special meaning. Also uses ^ (start of input) and $ (end of input).

When you have an address you can run a command on in, in this case we want the append command:

/^<entry key="mergeTemplates" value="false"\/>$/a\
<entry key="requestable">\
  <value>\
    <Boolean>true</Boolean>\
  </value>\
</entry>

That's is that's the full sed script. To run it you can save it in a file (insert_xml.sed), and use sed -f:

sed -f insert_xml.sed input_file.xml

Use the -i flag to make inplace edits, it will either be -i (GNU) or -i '' (Free BSD). Using -i.bak (GNU) or -i .bak (Free BSD) will create a backup with the filename plus .bak

And then write a for loop for the files needing the update:

for file in *.xml; do
  sed -i.bak -f insert_xml.sed "$file"
done
Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
  • Wasn't my DV, but at a guess because parsing `XML` with `regex` is a very bad practice, because you're using regular expressions on a language that _isn't_ regular. – Sobrique Dec 07 '16 at 12:01
  • 1
    @Sobrique True, but sometimes it's ok for simply substitutions. – Andreas Louv Dec 07 '16 at 12:55