2

I'm new to Perl scripting, but I need to do a large amount of regex find-and-replaces across hundreds of files.

I came across this website which recommends the Perl command perl -p -i -e 's/oldstring/newstring/g' * to get all files, and then perl -p -i -e 's/oldstring/newstring/g' 'find ./ -name *.html\' to filter that to certain files.

My goal is to find all *.csproj and *.vbproj files and replace a reference to a .dll to a new path.

Those are both XML file types.

The text I'm replacing is

<Reference Include="log4net, Version=1.2.10.0, Culture=neutral, PublicKeyToken=1b44e1d426115821, processorArchitecture=MSIL">
  <SpecificVersion>False</SpecificVersion>
</Reference>

with

<Reference Include="log4net, Version=1.2.10.0, Culture=neutral, PublicKeyToken=1b44e1d426115821, processorArchitecture=MSIL">
  <SpecificVersion>False</SpecificVersion>
  <Private>True</Private>
  <HintPath>..\..\..\..\ExternalDLLs\log4net.dll</HintPath>
</Reference>

The command I have so far is

perl -p -i -e 's/<Reference Include="log4net, (?:.*?[\t\s\n\r])*?<\/Reference>/<Reference Include="log4net, Version=1\.2\.10\.0, Culture=neutral, PublicKeyToken=1b44e1d426115821, processorArchitecture=MSIL"><SpecificVersion>False<\/SpecificVersion><Private>True<\/Private><HintPath>\.\.\\\.\.\\\.\.\\\.\.\\ExternalDLLs\\log4net\.dll<\/HintPath><\/Reference>/g'  `find . -type f \( -name "*.vbproj" -or -name "*.csproj" \)`

Which seems to try and work, but it just ends up deleting all of my *.vbproj and *.csproj files.

I can't figure out why my script is deleting files.

Any help?

Edit: it prints this out per file

Can't do inplace edit on ./Middletier/TDevAccess/AmCad.Components.TDevAccess.csproj: No such file or directory.

Edit 2: Im using Bash on Ubuntu on Windows if that matters

Could this be related?

Community
  • 1
  • 1
Andrew Diamond
  • 6,295
  • 1
  • 15
  • 33

2 Answers2

3

I'd suggest you're going to trip yourself up in two different ways if you're not really careful.

  • Parsing XML with regex is a bad idea. It's messy, because regex isn't contextual, where XML is.
  • Perl has a perfectly good Find module, that means you don't need to use the command version.

I don't know specifically why you're having a problem, but I'd guess it's because the find command is generating linefeeds, and you're not stripping them?

Anyway, I'd suggest that you do neither, and use XML::Twig and File::Find::Rule to do this job just within perl.

Something like:

#!/usr/bin/perl
use strict;
use warnings;

use File::Find::Rule;
use XML::Twig;

#setup the parser - note, this may reformat (in valid XML sorts of ways).
my $twig = XML::Twig->new(
   pretty_print => 'indented',

   #set a handler for 'Reference' elements - to insert your values.
   twig_handlers => {
      'Reference' => sub {
         $_->insert_new_elt( 'Private' => 'True' );
         $_->insert_new_elt(
            'HintPath' => '..\..\..\..\ExternalDLLs\log4net.dll' );

         #flush is needed to write out the change.
         $_->flush;
      }
   }
);

#use rules to find suitable files to alter.
foreach my $xml_file (
   File::Find::Rule->or(
      File::Find::Rule->name('*.csproj'),
      File::Find::Rule->name('*.vbproj'),
   )->in('.')
  )
{
   print "\nFound: $xml_file\n";

   #do the parse.
   $twig->parsefile_inplace($xml_file);
}

Following on from comments - if you want to extend to match a Reference attribute, there's two possiblities - either set a handler on the specific xpath:

twig_handlers => { 'Reference[@Include="log4net, Version=1.2.10.0, Culture=neutral, PublicKeyToken=1b44e1d426115821, processorArchitecture=MSIL"]' => sub { $_->insert_new_elt( 'Private' => 'True' ); $_->insert_new_elt( 'HintPath' => '........\ExternalDLLs\log4net.dll' );

     #flush is needed to write out the change.
     $_->flush;
  }

}

This selects based on attribute content (but bear in mind the above is quite long and convoluted).

Alternatively - the handler 'fires' for each reference you encounter, so you can build a test.

my $twig = XML::Twig->new(
   pretty_print => 'indented',

   #set a handler for 'Reference' elements - to insert your values.
   twig_handlers => {
      'Reference' => sub {
         #note - instead of 'eq' you can do things like regex tests. 
         if ( $_ -> att('Include') eq "log4net, Version=1.2.10.0, Culture=neutral, PublicKeyToken=1b44e1d426115821, processorArchitecture=MSIL") {
              $_->insert_new_elt( 'Private' => 'True' );
              $_->insert_new_elt( 'HintPath' => '..\..\..\..\ExternalDLLs\log4net.dll' );
         }

         #flush is needed to write out the change.
         $_->flush;
      },
   }
);
Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • I haven't ran this yet, but from a quick overview, it looks like it's finding an XML tag `Reference`, and adding the children `HintPath` and `Private`, correct? If so, how do I limit it to find tags that have an attribute set to a certain value? – Andrew Diamond Sep 02 '16 at 16:12
  • ie: `Reference` tags that have `Include="log4net, Version=1.2.10.0...` – Andrew Diamond Sep 02 '16 at 16:13
  • Easily enough. Bear with me, I'll update the example. http://xmltwig.org/xmltwig/quick_ref.html – Sobrique Sep 02 '16 at 16:21
0

perl -pi processes the input files line by line. Your substitution contains a regex that tries to match some text that spans across multiple lines, so it will not work correctly. You can activate the "slurp" mode with the -000 flag (i.e. perl -000 -pie '.....') which reads the entire file in memory. Of course, you need to make sure that you don't have any huge files in that directory. I don't know why the files get deleted, perl -i does rename the original files, but that doesn't appear to be the problem here.

Another thing to note is that the find ... command will fail if any file has a name that contains spaces, so maybe you so do something like IFS=$'\n' before executing the command.

redneb
  • 21,794
  • 6
  • 42
  • 54