3

I thought I had this figured out, but I'm wanting to find all occurances in a file where I have some text to delete between two double quotes.

I need to find a match first and then get everything from the first double quote to the match and then all the text to the second double quote and delete it. I don't want to just get text between two double quotes, as it may not be something in that file that I want to delete.

I used something like this:

perl -p -i.bak -e s/bar/foo/g bar.xml

first to do a find and replace that worked. Then I went to:

perl -p -i.bak -e s/..\/..\/bar\//g bar.xml

and that deleted everything up to bar, but I need to continue all the way to the second double quote and I'm not sure how to do that with Perl.

I assume it will be some regex mixed in, but nothing I've tried has worked. The part up to bar will always be the same, but the text will change after that point, however, it will always end with the second double quote for the part I want to delete. There will be text again after that point.

James Drinkard
  • 15,342
  • 16
  • 114
  • 137
  • 4
    Can there be escaped quotes within the quotes (`"a 2\" by 4\" piece of wood"`)? – Tim Pietzcker Mar 06 '12 at 18:38
  • What string are you trying to match? Including the quotes. – TLP Mar 06 '12 at 18:38
  • There won't be any other quotes in-between the two quotes, only text. Unfortunately, I can't post real data, but it would be similar to this: "../../../XXX/XX-XXXX-XXX-XXXXXXX-X.XXX" – James Drinkard Mar 06 '12 at 18:44
  • XML? Might be worth posting a sample - there is probably a better way. (doesn't have to be real data. The structure is important, the content is not) – Sobrique Feb 21 '16 at 09:55

3 Answers3

5
s/"[^"]*foo[^"]*"//g

works if there are no escaped quotes between the actual quotes, and if you want to remove a quoted string that contains foo:

"      # Match a quote
[^"]*  # Match any number of characters except quotes
foo    # Match foo
[^"]*  # Match any number of characters except quotes
"      # Match another quote
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Somehow that * wildcard is too greedy and it's changing the entire file. I'm a perl newbie, but this didn't work: perl -p -i.bak -e s/"[^"]*foo[^"]*"//g bar.xml – James Drinkard Mar 06 '12 at 18:57
  • @JamesDrinkard What OS are you using? I notice that you are leaving out quotes. Usually the code in a one-liner is quoted: `perl -e 'code'`. Single quote for linux, double quotes for Windows. If you leave out the quotes, you are screwing yourself over. – TLP Mar 06 '12 at 19:11
  • I'm using win7 64 bit with the latest version of ActivePerl for windows. With the quotes I still get garbage replacing all the text in the file ie ationroursrtitlratorratorsutilrorationroursrsutulisr... – James Drinkard Mar 06 '12 at 19:16
  • 1
    @JamesDrinkard You can't use double quotes in the regex in the Windows shell, (perhaps I should have mentioned that right away) because you can't escape double quotes in the windows shell (if I recall correctly). You'll need to write a small script. Just put the regex inside a script and call it with `perl -pi.bak script.pl bar.xml` – TLP Mar 06 '12 at 19:19
  • Thanks for the help and patience Tim and TLP. It works fine as a script. I don't know who said Perl was easy, but I disagree! – James Drinkard Mar 06 '12 at 19:29
  • 1
    @JamesDrinkard: Sorry for not replying to your comments (I was having dinner). Glad to hear you and TLP figured it out. I also don't know anybody who would say that Perl way easy (the only language in which programs look the same before and after RSA encryption, as the old saying goes). You might want to look into Python. I have never had [that much fun programming](http://xkcd.com/353/) before getting to know Python. – Tim Pietzcker Mar 06 '12 at 20:14
  • Plus 1 for the humor. I haven't looked into python yet, but thanks for the encouragement. – James Drinkard Mar 06 '12 at 20:21
2

Some people were asking about escaped quotes. There's a couple of tricks here. You want to ignore escaped quotes like \", but not quote characters that have an escaped escape, like \\". To ignore the first, I use a negative look behind. To not ignore the second, I temporarily change all \\ to . If you have in your data, choose something else.

use v5.14;
use utf8;
use charnames qw(:full);

my $regex = qr/
    (?<!\\) "  # a quote not preceded by a \ escape
    (.*?)      # anything, non greedily
    (?<!\\) "  # a quote not preceded by a \ escape
    /x;

while( <DATA> ) {
    # encode the escaped escapes for now
    s/(?:\\){2}/\N{SMILING CAT FACE WITH OPEN MOUTH}/g;
    print "$.: ", $_;

    while( m/$regex/g ) {
        my $match = $1;
        # decode the escaped escapes
        $match =~ s/\N{SMILING CAT FACE WITH OPEN MOUTH}/\\\\/g;
        say "\tfound → $match";
        }
    }

__DATA__
"One group" and "another group"
This has "words between quotes" and words outside
This line has "an \" escaped quote" and other stuff
Start with \" then "quoted" and "quoted again"
Start with \" then "quoted \" with escape" and \" and "quoted again"
Start with \" then "quoted \\" with escape"
Start with \" then \\\\"quoted \\" with escape\\"

The output is:

1: "One group" and "another group"
    found → One group
    found → another group
2: This has "words between quotes" and words outside
    found → words between quotes
3: This line has "an \" escaped quote" and other stuff
    found → an \" escaped quote
4: Start with \" then "quoted" and "quoted again"
    found → quoted
    found → quoted again
5: Start with \" then "quoted \" with escape" and \" and "quoted again"
    found → quoted \" with escape
    found → quoted again
6: Start with \" then "quoted " with escape"
    found → quoted \\
7: Start with \" then "quoted " with escape"
    found → quoted \\
brian d foy
  • 129,424
  • 31
  • 207
  • 592
0

You input says the file is .xml - so I'm going to say what I usually do.

Use an XML Parser - I like XML::Twig because I think it's easier to get to grips with initially. XML::LibXML is good too.

Now, based on the question you're asking - it like you're trying to rewrite a file path within an XML attribute.

So:

#!/usr/bin/env perl/

use strict;
use warnings;

use XML::Twig;

#my $twig = XML::Twig -> parsefile ( 'test.xml');
my $twig = XML::Twig -> parse ( \*DATA );

foreach my $element ( $twig -> get_xpath('element[@path]') ) {
   my $path_att = $element -> att('path');
   $path_att =~ s,/\.\./\.\./bar/,,g;
   $element -> set_att('path', $path_att);
}

$twig -> set_pretty_print('indented_a');
$twig -> print;
__DATA__
<root>
   <element name="test" path="/path/to/dir/../../bar/some_dir">
   </element>
   <element name="test2" nopath="here" />
   <element path="/some_path">content</element>
</root>

XML::Twig also quite usefully supports parsefile_inplace to work "sed style" to amend a file. The above is an illustration of the concept with some sample XML - with a clearer example of what you're trying to do, I should be able to improve it.

Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101