1

Input :

"Supermajority Vote for State Taxes or fees" or taxes or "ssd or ffF"

Expected output :

"Supermajority Vote for State Taxes or fees" | taxes | "ssd or ffF"

What I tried, yet I am not able to handle multiple occurrence:

preg_replace("/(\".*\")\s+(or)\s+(.*)/", "$1 | $3", $input);
סטנלי גרונן
  • 2,917
  • 23
  • 46
  • 68
  • `str_replace (' or ',' | ')`? – Michel Jan 19 '18 at 16:38
  • 1
    @Michel it shouldn't replace `or` that is present inside double quotes – riteshtch Jan 19 '18 at 16:42
  • `or(?=(?:[^"]*"[^"]*")*[^"]*\Z)` then? Nicked from [this question](https://stackoverflow.com/questions/11502598/how-to-match-something-with-regex-that-is-not-between-two-special-characters) – Michel Jan 19 '18 at 16:52
  • Possible duplicate of [How to match something with regex that is not between two special characters?](https://stackoverflow.com/questions/11502598/how-to-match-something-with-regex-that-is-not-between-two-special-characters) – Michel Jan 19 '18 at 16:54
  • @Michel Why have you find it only now ?! :) – splash58 Jan 19 '18 at 17:00

3 Answers3

1

Check that an amount of quotes until the end of a string is even

\bor\b(?=([^\"]|\"[^\"]+\")+$)

demo and some explanations

\b - word boundary

(?= - Positive Lookahead tests that expretion presents after

([^\"]|\"[^\"]+\") - no quotes or "some things in quotes"

splash58
  • 26,043
  • 3
  • 22
  • 34
1

There is probably a fix for the regex you give in your question. But what if you need a quote in your input?

"Supermajority Vote for \"State Taxes\" or \"fees\"" or taxes or "ssd or ffF"

Ok, so now you want to find the strings between quotes, unless the quote is preceded by a backslash. But what if you want a backslash at the end of a string?

"Supermajority Vote for State Taxes or fees\\" or taxes or "ssd or ffF"

So now you want to find the strings between quotes, unless it is preceded by a backslash, unless that backslash is preceded by another backslash.

You can continue like this, but it is not possible to write a regex that supports this with an infinite amount of backslashes. To do this correctly, you'd need to build a lexer.

Daan
  • 327
  • 3
  • 10
  • This is more of a theoretical "ponder this" than an actual answer. You're not wrong, but you aren't really coding up a solution either. If you want to recommend a tool/off-site resource, that's fine, post your suggestion as a comment under the question. – mickmackusa Jan 22 '18 at 01:11
1

A perfect example for (*SKIP)(*FAIL):

"[^"]+"(*SKIP)(*FAIL)|\bor\b

This needs to be replaced by |, see a demo on regex101.com.


In PHP:
<?php

$string = '"Supermajority Vote for State Taxes or fees" or taxes or "ssd or ffF"';
$regex = '~"[^"]+"(*SKIP)(*FAIL)|\bor\b~';

$string = preg_replace($regex, '|', $string);

echo $string;
?>

Which yields

"Supermajority Vote for State Taxes or fees" | taxes | "ssd or ffF"


Broken down, the expression means:
"[^"]+"        # everything between "..."
(*SKIP)(*FAIL) # "forget" everything to the left
|              # or
\bor\b         # or with boundaries on both sides (meaning neither for nor nor, etc.)


As @mickmackusa points out, you could even use escaped backslahes, see a demo on regex101.com.
Jan
  • 42,290
  • 8
  • 54
  • 79
  • 1
    @NikhilBhivgade Honestly. (*assuming you have balanced quoting and no escaped quoting to deal with) You can't do this task better than what Jan has posted. It is fastest, cleanest, and most direct. SO moderation doesn't like me to ask you to move the green tick but it certainly does grab the attention of new visitors/readers/researchers. So whether you move the green tick or not, you should definitely implement this method in your project because it is best. I am not hating on splash, these are just the facts. +1 from me. – mickmackusa Jan 22 '18 at 01:09
  • 1
    p.s. you can handle escaped double quotes like this: `~"(?:[^"]+|\\")*"(*SKIP)(*FAIL)|\bor\b~` – mickmackusa Jan 22 '18 at 01:15
  • @mickmackusa: Brilliant, thanks. Was wondering about the backslash Thing (+1 for other answer here ;-)). – Jan Jan 22 '18 at 07:14