2

I'm looking for way how to change quotes for fancy ones: "abc" -> «abc».

It works for me in simple situations and next step i am looking for is how to get it work also with nested quotes: "abc "d e f" ghi" -> «abc «d e f» ghi»

$pk =~ s/
  "(                          # first qoute, start capture
    [\p{Word}\.]+?            # at least one word-char or point
    .*?\b[\.,?!]*?            # any char followed boundary + opt. punctuation
  )"                          # stop capture, ending quote
  /«$1»/xg;                   # change to fancy

I hoped regex will match 1st and 3rd quote and changes them. And it does. Problem is: i hoped then match again 2nd and 4th, but it wont, because 2nd is already left behind. One solution is to run same replacement again until there is less than 2 quote chars in.

Is there better way to achieve my goal? My approach won't work when there will be third level of nesting and this is not my goal, i stay with 2 levels.


NB! Changing startquote and enquote in separate replacement wont work because then will single doublequotes replaced too. I need to replace only when they appear as couple!

MORE examples:

"abc "d e f" -> «abc "d e f»
"abc"d e f" -> «abc"d e f»

This seems impossible:

"abc" d e f" -> «abc" d e f»
w.k
  • 8,218
  • 4
  • 32
  • 55
  • 1
    That's quite an ambiguous goal. Why shouldn't `«abc »d e f« ghi»` be just as valid...? – deceze Feb 17 '13 at 12:28
  • @deceze: because quotes are bound to words, they embrace words, inside quotes there is no spaces next to quotes. – w.k Feb 17 '13 at 12:33
  • @w.k So basically, the space around your quotation marks is the way your fancy quote should point? – TLP Feb 17 '13 at 12:35
  • @TLP: It may be space or other boundary or end of string – w.k Feb 17 '13 at 12:37
  • @TLP: Cosider next title: "There were tree 7" players in game" – w.k Feb 17 '13 at 13:05
  • @w.k Ah, I see, yes, it can be an imperial measurement, nevermind. – TLP Feb 17 '13 at 13:09
  • @w.k There will be problems. You need an additional rule to disambiguate the edge cases. For example, both `<> players in the game"` and `<>` can be seen as valid strings. – TLP Feb 17 '13 at 13:15
  • @TLP: yes, this case is hard to handle. that's why i said this situation may be impossible to cover with – w.k Feb 17 '13 at 13:18
  • @w.k There really is no point in us trying to guess how to parse your data and you adding new rules to break our solutions over and over. You need to come up with some exhaustive test examples and assign rules to explain them. Then we can help you find a solution. – TLP Feb 17 '13 at 13:28
  • @TLP: i am sorry being vague, i tried to trim my problem so thin as possible, but there seems to be more points i considered trivial. – w.k Feb 17 '13 at 13:34

2 Answers2

2

There is no general way to pair up nested double quotes. If your quotes are always next to the beginning or end of a word then this may work. It replaces a double quote that precedes a non-space character with an open quote, and one that succeeds a non-space character with an close quote.

use strict;
use warnings;
use utf8;

my $string = '"abc "d e f" ghi"';

$string =~ s/"(?=\S)/«/g;
$string =~ s/(?<=\S)"/»/g;

print $string;

output

«abc «d e f» ghi»
Borodin
  • 126,100
  • 9
  • 70
  • 144
2

You can use negative lookaround assertions to find the matching directions on your fancy quotes. The double negations help handle the edge cases (e.g. end/beginning of line). I used << and >> instead of your fancy quotes here for simplicity.

use strict;
use warnings;

while (<DATA>) {
    s/(?<!\S)"(?!\s)/<</g;
    s/(?<!\s)"(?!\S)/>>/g;
    print;
}

__DATA__
"abc "d e f" ghi"

Output:

<<abc <<d e f>> ghi>>
TLP
  • 66,756
  • 10
  • 92
  • 149