0

I have this string:

$var = "Foo (.* ) bar blla (.* ) b*l?a$ bla bla ";

I want to escape the * and ? and all special characters that are not gathered in this shape

"(.*)"

I wanted to use preg_quote($var, '\') But it escapes all the special characters, and I only need the single special characters to be escaped. I want this result:

$var = "Foo (.* ) bar bla(.*) b\*l\?a\$ bla bla ";

I want to use the final $var (the result) in a preg_match that matches all (.*) in an other string, and the special characters which are in my case theses :

., \, ^, $, |, ?, *, +, (, ), [, ], {, }, and /

should be parsed as a normal text so they should be escaped. while the (.*) one shouldn't be escaped. Only the special characters above should be escaped, because I will have to use $var in preg_match. The other special characters, no need to escape them.

preg_match("/" . $var . "/s", $anotherstring, $match);
Mana
  • 167
  • 3
  • 13
  • What is "this shape"? Is it that it's inside parentheses, or is it exactly the string `(.*)`? – SamWhan Jun 13 '17 at 08:36
  • It s exactly the string (.*) – Mana Jun 13 '17 at 08:37
  • There is nothing escaped in your expected outcome (`/` is not an escape character). – axiac Jun 13 '17 at 08:51
  • How do you get the content of `$var`? – axiac Jun 13 '17 at 08:55
  • Yes I have updated my question, but the preg_quote doesn'work in my case. I . get the $var another preg_replace.. that parses the $var and changes some parts of it to (.*) – Mana Jun 13 '17 at 09:15
  • @Mana I was smacked with a week-ban a while back and lost track of this question. Can you update your question to include 3 to 5 different `$var` samples? I am trying to wrap my brain around your question again. I recall asking if your parenthetical expressions ALWAYS contain `.*` If you provide three to five examples and your expected result for each, I should be able to confidently/accurately provide a pattern for you. If I can't provide a pattern that is better than ClasG's then I will let you know. – mickmackusa Aug 08 '17 at 23:06
  • Hey @mickmackusa , thank you for the help, I actually used your solution, I just did some modifications, this was the solution: `preg_replace('/\(\.\*\)(*SKIP)(*FAIL)|([\/$^&*()_+{}[\]|.?\\\])/s', '\\\$1', $var)` – Mana Oct 06 '17 at 10:21

3 Answers3

2

Edit3

It appears as if it didn't work for you, so here's another attempt. And since mickmack seems to be worried about performance, he'll be glad that it's down to 146 steps ;)

Replace

([\w\s]*(?:\([^)]*\)[\w\s]*)*)([*?$&])

with

$1\\$2

Here at regex101.

It captures an optional range of non special characters. It goes on capturing an optional parenthesized group, followed by an optional range of non special characters. This last part can repeat any number of times. Finally it captures the special character.

So we have to capture groups - one with the text leading up to the special character (if any), and one with the special character.

Replacing this with the content of them with a \ in between, does the trick.

This is also more flexible with the parentheses part (happy mick? ;). It allows more complex regex'es inside the brackets (just not nested parentheses).

If the new requirement of handling \'s isn't a must, and a negated word class is OK \W, we're down to a blazing 76 steps :) Here at regex101.

--Original answer--

This is one way of doing it - replace

(?<!\(|\(.|\(..)([^\w\s])(?![^(]*\))

with

\$1

Note! You have to escape the \in the php string - i.e. "\\$1".

Since php only allows fixed with look-behinds, it tests that there isn't an opening parentheses before the special character in four steps with the (?<!\(|\(.|\(..|\(...)construct. Then it matches, and captures, the special character (not a word character, nor a space). Lastly it uses a negative look-ahead to make sure it isn't followed by a closing parentheses. Checking the parentheses both before and after may be redundant though.

Replacing the matched, and captured, character by itself - $1 - preceded by the wanted escape character \ will do the trick.

See it here at regex101.

Edit

Here's an alternative way if the special characters are limited to the one in your example - use

(?<!\(\.)([*?$&])(?!\))

as the search string and replace with \$1.

It matches your special characters as long as they're not preceded by (., nor followed by ).

Here at regex101.

(Neither of the ways are waterproof since they would fail to escape the & in (.& ).)

Edit2

Updated since OP changed escape character in question from / to \. And removed the space inside the capturing group as it was not wanted by OP.

SamWhan
  • 8,296
  • 1
  • 18
  • 45
  • Thank you, but The /$1 does not escape the characters, the single * should be escaped like \* but /$1 replaces the * to /* – Mana Jun 13 '17 at 09:05
  • ??? From your question: I want this result: `$var = "Foo (.* ) bar bla(.*) b/*l/?a/$ /&/& bla bla ";` - the `*` is *quoted* with a `/`... – SamWhan Jun 13 '17 at 09:07
  • If `*` should never be escaped - then what are your *special characters*. Because replacing `([?$&])` with `/$1` sounds like what you want then... – SamWhan Jun 13 '17 at 09:14
  • Normally you escape special characters like * with a back slash – Mana Jun 13 '17 at 09:16
  • if I have * or ? or any other special character alone it should be escaped . If * is in (.*) it should be ignored – Mana Jun 13 '17 at 09:18
  • Three comments ago you complained that my solution *escaped* the single `*`. Now you say it should be (as you do in the question). – SamWhan Jun 13 '17 at 09:44
  • I have updated the question can you please take a look I think now it 's more clear why I need to escape the single special characters. – Mana Jun 13 '17 at 09:53
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/146514/discussion-between-clasg-and-mana). – SamWhan Jun 13 '17 at 09:54
1

Here are a few patterns that outperform ClasG's answer:

Input: Foo (.* ) bar blla (.* ) b*l?a$ && bla bla

Pattern: /\([^)]*\)(*SKIP)(*FAIL)|([^a-z\d ])/i Replace with: \\\1

Output: Foo (.* ) bar blla (.* ) b\*l\?a\$ \&\& bla bla

Pattern Demo (just 122 steps)

Basically it just omits the "protected" parenthetical portion and matches any non-alphebetic & non-space characters.


If you want to specifically list the symbols, you can just change the negated character class to the character class in the OP like this: (still 122 steps)

/\([^)]*\)(*SKIP)(*FAIL)|([-\/~`!@#$%^&*()_+={}[\]|;:'"<>,.?\\])/

or you can use only the symbols in your sample, here's the full pattern (still 122 steps):

/\([^)]*\)(*SKIP)(*FAIL)|([*?$&])/

All of ClasG's patterns are slower than my 3 patterns above:

ClasG's written pattern: (?<!\(|\(.|\(..)([^\w\s])(?![^(]*\)) fails and takes 418 steps - demo

ClasG's linked demo pattern: (?<!\(|\(.|\(..)([^\w\s])(?![^(]*\)) is correct but takes 367 steps - demo

ClasG's third pattern: (?<!\(\.)([*?$&])(?!\)) is correct but has a strict requirement for the parenthetical portion. It is the best pattern in that answer taking 186 steps - demo.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • Okay thank you, but your pattern doesn't match the "(" in this case (.*)((.*) – Mana Jun 16 '17 at 08:31
  • There is a problem with the backslash when the string contains a \n , the "\" is escaped and it is no longer a line break. The \ followed by "n" should be ignored. – Mana Jun 16 '17 at 08:58
  • @Mana do your parenthetical statements always only contain dot asterisk? – mickmackusa Jun 16 '17 at 12:43
  • Nope, I am using your pattern , but had to change it: `\(\.\*\)(*SKIP)(*FAIL)|([\/$^&*()_+{}[\]|.?])` I don't know if it 's correct but it actually works for me – Mana Jun 16 '17 at 14:05
  • @Mana if the answer to my question about your "protected" `(.*)` is "Nope" then your pattern will eventually fail you. For me to provide the most accurate pattern, it is imperative that you provide several relevant / different samples that represent the fringe cases of your project, so that I can analyze which parts are static and which parts change. Please post a few new samples as an edit to your question and their expected results (to be fair to other volunteers and to make your question clear). I'll check your question tomorrow to see if I can provide you with a reliable pattern. – mickmackusa Jun 16 '17 at 14:48
  • @Mana Does this suit all of your known cases? `(?:\([^)(]*\)|\\[rnst])(*SKIP)(*FAIL)|([^a-z\d ])` I will edit my answer if this is what you want, but you should also update your question so that my answer makes sense to future readers. https://regex101.com/r/iTGMho/6 – mickmackusa Jun 16 '17 at 21:08
  • I have updated my question, to make it clear, I made a mistake in the inputs, so only the special characters that need to be escaped in a regex expression should be escaped not all of them , so the "&" in the example should not be escaped.. I m sorry! I thought it was a special character in regex expressions.. – Mana Jun 17 '17 at 06:15
  • But why the pattern , I mentioned in the comment will fail? Could you please explain ? – Mana Jun 17 '17 at 06:17
  • @Mana I'm spending the weekend with my family, I'll respond when I get back to my computer. – mickmackusa Jun 17 '17 at 08:06
-1

use preg_replace_callback, you can look regexp https://regex101.com/r/52qQwv/1

$s = 'Foo (.*) bar blla (.*) b*l?a$&& bla bla';
$regexp = '/([\.\*\?\&\$])[\w\s\&]/iu';
$f = function ($matches) {
    return '/' . $matches[1];
};
$a = preg_replace_callback($regexp, $f, $s);
var_dump($a);

string(39) "Foo (.) bar blla (.) b/*/?/$/&bla bla"

Knase
  • 1,224
  • 14
  • 23
  • 1
    Changing the slash to backslash (as dictated by OP's question edit) in the return will cause a syntax error. This pattern has needless escaping in the character classes and unnecessary flags. Please either edit or delete this answer because it is not useful and teaches bad practices. – mickmackusa Jun 14 '17 at 02:29