1

I have the following regular expression:

$url = "http://example.com?param=test1\test2\test3\test4";

$cleanUrl = preg_replace('|[^a-z0-9-~+_.?\[\]\^#=!&;,/:%@$\|*`\'<>"()\\x80-\\xff\{\}]|i', '', $url);

I get the following output:

http://example.com?param=test1est2est3est4

But, I'm expecting the following output:

http://example.com?param=test1\test2\test3\test4

I tried to this regex, but it's not working:

    $cleanUrl = preg_replace('|[^a-z0-9-~+_.?\[\]\^\\#=!&;,/:%@$\|*`\'<>"()\\x80-\\xff\{\}]|i', '', $url);
                                                    ^ escaped single quote
Kristian Vitozev
  • 5,791
  • 6
  • 36
  • 56
  • Are you sure about the input string? See https://ideone.com/OnepGA. I think it should be `$url = "http://example.com?param=test1\\test2\\test3\\test4";`. Then use `$cleanUrl = preg_replace('|[^-\\\\a-z0-9~+_.?\[\]\^#=!&;,/:%@$\|*\`\'<>"()\x80-\xff\{\}]|i', '', $url);`. See [this demo](https://ideone.com/tUHtU3). – Wiktor Stribiżew Sep 29 '16 at 12:33
  • You have to use \\\\ to escape a backslash. Read this [SO Answer](http://stackoverflow.com/a/4025505/5447994) – Thamilhan Sep 29 '16 at 12:42

1 Answers1

1

Perhaps, what you are doing can be achieved with other means, but answering your question, I should note that your input string does not contain backslashes, it contains tab characters as inside double quoted string literal, \t defines an escape sequence.

Once you use a single quoted literal, \t denotes 2 symbols. Now, the regex does not have \. You need to add it with \\\\:

$url = 'http://example.com?param=test1\test2\test3\test4';
$cleanUrl = preg_replace('|[^-\\\\a-z0-9~+_.?\[\]^#=!&;,/:%@$\|*`\'<>"()\x80-\xff{}]|i', '', $url);
echo $cleanUrl;

See this PHP demo printing http://example.com?param=test1\test2\test3\test4.

I also moved the - to the beginning (it is just best practice to put it at the start or end of the character class if it should match a literal hyphen), and ^ that is not in the initial position in the char class does not have to be escaped. Same goes for {, }, and also [, but that square bracket is better escaped (some regex flavors disallow unescaped [ in the character class).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563