-1

I'm trying to strip hidden control chars (especially \x{89} and \x{88}) with preg_replace() from a string. It is "ˆText" (it starts with an "\x{88}" char), mb_detect_encoding says it is UTF-8.

The code used is $result = preg_replace('/\x{88}/u','',$string); but the result is null.

If I use the code without /u modifier I get "�Text", the control char is replaced with a replacement char (U+FFFD).

I'm using PHP 7.1 on Windows. The same search with BBEdit and NotePad++ replaces the chars correctly.

Any ideas?

Thanks, A.

alessandro
  • 11
  • 1
  • try reading [this](https://stackoverflow.com/questions/1497885/remove-control-characters-from-php-string) – Chin. Udara Dec 25 '20 at 09:23
  • Thanks, I tried all the solutions but don't work for me. – alessandro Dec 25 '20 at 09:29
  • If `preg_replace` returns null then it is due to an error. Try calling `preg_last_error ` after your `preg_replace`. Then compare the error code with the errors mentioned in the doc [here] (https://www.php.net/manual/en/function.preg-last-error.php) – Chin. Udara Dec 25 '20 at 09:34
  • `ˆ` is not `\x{88}`, it is `\x{2C6}`. Also, why not just use `str_replace("\u{02C6}", "", $string)`? – Wiktor Stribiżew Dec 25 '20 at 11:24
  • `preg_last_error` returs code "4" that is PREG_BAD_UTF8_ERROR. Thanks. – alessandro Dec 25 '20 at 12:13
  • @Wiktor Stribiżew - The original char `\x{88}` was replaced when posted. I tried also your suggestion but It doesn't work. Thanks. – alessandro Dec 25 '20 at 14:22
  • Solved this with [iconv( 'UTF-8', "ISO-8859-1//IGNORE",$)](https://stackoverflow.com/questions/50074737/cleaning-sql-incorrect-string-value-error-from-php) – alessandro Dec 26 '20 at 09:04

1 Answers1

0

preg_replace() returns "null" only on error. Run preg_last_error() right after preg_replace() and check the returned error code.


As a side note: Your wording suggests that you want to strip all control characters, not just the two explicitly mentioned. Then you would be better of matching against "\p{Cc}"

preg_replace('/\p{Cc}/u', '', $string);
Marc
  • 1
  • You are right, the function returns 4/PREG_BAD_UTF8_ERROR error code. The files comes out from a shell_exec pre-processor that should strip all control chars but sometimes fails with some characters. I suppose these files have problem with characters encoding. So I'm trying to fix without success. Thanks. – alessandro Dec 25 '20 at 14:11
  • It fails also with `preg_replace('/\p{Cc}/u', '', $string);`, same error "4". It seams it is not possibile to parse this string. – alessandro Dec 25 '20 at 14:29
  • The string you are trying to parse is just not valid UTF-8. If you get the String from shell_exec(), the called executable is not delivering the output in UTF-8 charset. – Marc Dec 26 '20 at 15:28
  • If the script is running on a Linux machine, check the output of the "locale" command. if the default locale is not "xx_XX.UTF-8" but something like xx_XX.ISO8859-1 then that might be the charset your executable is using. you still can convert that with mb_convert_encoding() before running preg_replace() – Marc Dec 26 '20 at 15:35