1

I am not a pro developer and need a simple solution. I have tried using fart.exe within a Windows Bat file to accomplish this, but having trouble finding the exact lines I need to replace line breaks. In an XML file, here is what I am trying do.

I need to go from this (a few lines in the middle of a larger file):

<meta name="xyz:moreinfohere" content="some content"/>
            <meta name="abc:evenmoreinfo" content="more content
and here is where
the problem lies"/>
            <meta name="abc:infoagain" content="this is confusing"/>
            <meta name="xyz:blahblah" content="please help"/>

to this:

            <meta name="xyz:moreinfohere" content="some content"/>
            <meta name="abc:evenmoreinfo" content="more content&#xa;and here is where&#xa;the problem lies"/>
            <meta name="abc:infoagain" content="this is confusing"/>
            <meta name="xyz:blahblah" content="please help"/>

The data filled in these fields will be variable, and this is a fictitious example. Basically, i am trying to replace the line breaks with the XA code, but only certain lines as you can see. I have managed to use fart.exe to replace all instances of \n\r but i can't figure out how to only do the needed ones. Not every line starts with "meta...". However every line in the files is supposed to end with ">" ...its the only constant/fixed character on every line in the files. Please help! I open to anything that works in a standard Windows Bat file (fart, java, etc.)

Jeff Bunn
  • 11
  • 1
  • Use `powershell.exe` then! not only can it search and replace using regular expessions, or standard strings, it also has built in support for `xml` too. – Compo Apr 06 '20 at 19:29
  • @Compo, A standard XML parer won't work here. A compliant parser must replace the line feeds with spaces, which is why the OP wants to change the lines feeds to ` `. This would cause a parser to return line feed. – ikegami Apr 06 '20 at 19:56
  • @ikegami This works perfectly! Thank you. One small question: The result adds the code as ` `. Even when i change line 15 in fix.pl, i can't seem to change it. I need either ` ` or ` `. How can I update that? – Jeff Bunn Apr 06 '20 at 21:12
  • Those three character references are all equivalent. But literally replace ` ` with one of the others in the code if that's what you want – ikegami Apr 06 '20 at 21:16
  • lol i was modifying a copy of the pl file. thanks again man! works perfectly – Jeff Bunn Apr 06 '20 at 21:25

1 Answers1

0

As you found out, a standard-compliant XML parser will replace a line feed in an attribute's value with a space unless the line feed is encoded using a character reference (e.g. &#xA;). (Reference)

So while I would normally recommend using a proper XML parser, that won't work here because we're trying to fix broken XML (i.e. XML that means something different than what we want it to mean).

We could write a proper XML parser that simply doesn't perform the line feed to space substitution and use that to fix the file, but that's a lot of work. The following is probably sufficient.

Assumptions:

  • All attributes values that need fixing use double-quotes (not single-quotes).
  • Double-quotes are always found in pairs in the documents to be fixed.

fix.pl:

use strict;
use warnings;

local $/;
while (<>) {
   while (1) {
      /\G ( [^"]+ ) /xgc
         and print $1;

      /\G \z /xgc
         and last;

      /\G ( " [^"]* " ) /xgc
         and do {
            print $1 =~ s/\n/&#xA;/rg;
            next;
         };

      die("Unbalanced quotes");
   }
}

Usage:

perl fix.pl file_to_fix.xml >fixed_file.xml

or

perl -i.bak fix.pl file_to_fix.xml

The latter modifies the file in-place after making a backup.

After you use this tool, use a file comparison tool (e.g. Beyond Compare) to make sure the fix was properly applied.

ikegami
  • 367,544
  • 15
  • 269
  • 518