1

If they're either not followed by a comma or a ) (close bracket) or if a ( (open bracket) precedes it.

I need it to clean up an SQL import file with 50k+ rows that contains ' (apostrophe) in between ' (apostrophe) and ' (apostrophe). I could rewrite the PHP script to consider this issue, but the solution I am asking you about would save me a lot of time.

Example:

INSERT INTO `base__raw`
(`txtField1`, `txtField2`) 
VALUES 
('Good string', 'This is a good string'), 
('Bad string', 'aramäisch ab a' = Vater');

After "replacing" all said apostrophes by putting a backslash ahead of them:

INSERT INTO `base__raw`
(`txtField1`, `txtField2`) 
VALUES 
('Good string', 'This is a good string'), 
('Bad string', 'aramäisch ab a\' = Vater');
Doy D
  • 21
  • 4
  • You sould show examples of the string that you want to modify. – GMB Feb 05 '20 at 22:35
  • Do you mean that there's three apostrophes in a row? ''' Or just like 'this'string' ? – MrTrick Feb 05 '20 at 22:36
  • I'm assuming you are talking about using regex search and replace in Notepad++, correct? – pseudocoder Feb 05 '20 at 22:51
  • I think I came up with a good matching pattern for this: (?<=\('.*)'(?=.*',\W)|(?<=,\W'.*)'(?=.*'\)) This looks for the two quoted blocks of text inside the parentheses using lookahead and lookbehind, and an apostrophe within. – pseudocoder Feb 05 '20 at 23:15

1 Answers1

2

There's lookbehind and lookahead tricks that can be useful, but difficult to remember. Here's a solution that just uses simple matching and one step.

Update:

Find what: ([^\(,\s]\s*)'(\s*[^\),\s])

  • First match a character that's not ( or , then any spaces.
  • In the middle match the apostrophe '.
  • Then match any spaces then a character that's not ) or ,.

Replace with: \1\\'\2

  • Characters before the apostrophe
  • A backslash
  • An apostrophe
  • Characters after the apostrophe

Old Solution: Not recommended
This might answer the question strictly as posed, but it will not fix your provided sample.

Or you can replace in two simpler steps:

First:
Find what: '([^,]) An apostrophe, followed by a character that is not a comma.
Replace with: \\'\1 A backslash, an apostrophe, and that character

And then:
Find what: \(' An open bracket and then an apostrophe
Replace with: \(\\' An open bracket, a backslash, an apostrophe

MrTrick
  • 1,927
  • 12
  • 11
  • Thank you Wai Ha Lee, that's much clearer. – MrTrick Feb 05 '20 at 22:54
  • Thanks very much for the answers! This solution seems to work with maybe one flaw: In Notepad++ it marks the character after ' (apostrophe) as well. So I expect it to replace all marked characters, but just to be sure... Did I get you right that by `\1\\'\2` it will handle any 1 character that follows the apostrophes (except the mentioned ones of course) over from what it found to what it has to replace? Naturally it looks like it is doing too much hence my confusion. Maybe that's what a lookback would avoid? – Doy D Feb 09 '20 at 13:19
  • The two sets of brackets `(...)` in the "Find What" field *capture* everything inside that part of the matched string. Then `\1` and `\2` in the "Replace With" field put them back. – MrTrick Feb 10 '20 at 18:19
  • I agree the amount of matched/highlighted text looks concerning at first glance. You **could** do this without it matching/highlighting any of the extra letters using lookbehind/lookahead, but it's more complex and not strictly needed. – MrTrick Feb 10 '20 at 18:19
  • 1
    Provided solution works! – Doy D Mar 02 '20 at 14:23
  • it's work, thank you – Arvind Agrahari Jun 07 '21 at 07:50