0

I've got a string, that is UTF-8 encoding according to mb_detect_encoding(). I want to trim like this:

$string = trim($string);

But it has no effect.

When I look at the string with urlencode($string) it displays:

"++++++++++++++++String+more+text++++++++++++"

According to: https://markushedlund.com/dev/trim-unicodeutf-8-whitespace-in-php/ I tried this code, but no effect:

preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $string);

How do i trim this? How can I find what the space character stands for and then replace it. All I know is urlencode, but this just tells me it's a space by showing +++.

Update: Thanks to @Stefanov.sm in the comments below, I learned that you can output the string to hex with: bin2hex($string); Then I see a whole lot of 20202020 and I see 20 stands for space in UTF-8 encoding. Strange though the trim won't work, but what does is:

$string = str_replace("\x20","",$string);

Maybe I can figure this out why. But at least the objective to get rid of them is completed.

  • Reading [the manual](https://www.php.net/manual/en/function.urlencode.php) the `+` should have been a space. – KIKO Software Oct 09 '20 at 09:15
  • Sorry, updated the question, I understand the + stands for the space, but how do I trim this? I can't get rid of these spaces. –  Oct 09 '20 at 09:17
  • Without a reproducible example it is difficult to say anything about this. – KIKO Software Oct 09 '20 at 09:18
  • 3
    There might be non-printable Unicode characters in your initial string. Can you hex-dump it with [bin2hex](https://www.php.net/manual/en/function.bin2hex.php) first and have a look? Or apply `mb_convert_encoding` into your relevant codepage and then `trim`? – Stefanov.sm Oct 09 '20 at 09:31
  • Hi @Stefanov.sm thanks, did not think of that. Ok, when put in bin2hex I get a whole lot of: "2020202020202020" This seems to stand for a space sign: https://www.fileformat.info/info/unicode/char/20/index.htm. Now how to get rid of them. I tried str_replace("\0x20","",$string); but this won't work. –  Oct 09 '20 at 10:15
  • @RobbertRenolds \x20 is actually a space and `trim` should remove it by default. Could it be that there is something non-printable **before** the spaces? Could you please paste the hex string? – Stefanov.sm Oct 09 '20 at 10:22
  • Thanks @Stefanov.sm that was it, $string = str_replace("\x20","",$string); works. But than it is strange that the trim won't fix it. Encoding is not my favorite part, I find it hard to follow that you see something, but under the hood it's a totally different thing. There are no other characters in the hex string, just a lot of 2020202020 and the normal characters of the words. –  Oct 09 '20 at 10:29

3 Answers3

2

the "+" signs remains for white-space.

What you should try to do is to use mb_detect_encoding function to be sure of the encoding. https://www.php.net/manual/fr/function.mb-detect-encoding.php

<?php
    mb_detect_encoding($str, 'UTF-8', true); // Will tell you TRUE or FALSE 
?>
Colin
  • 865
  • 1
  • 6
  • 23
dsqezfzdef
  • 31
  • 3
0

Try explicitly naming "+" for removal:

%string = trim($string, "+ ");

Note the space after "+", which means "remove both spaces and plus-signs".

Encoding has probably nothing to do with his, unless those pluses are a misrepresentation of some other character.

Zsolt Szilagyi
  • 4,741
  • 4
  • 28
  • 44
  • 1
    Those pluses only appear because of the `urlencode()`. The OP is wondering why the spaces weren't removed by `trim()`. – KIKO Software Oct 09 '20 at 09:25
  • Ah, good point. Still, space as part of the ascii set should be the same in any encoding. I guess there is some other issue in the code. – Zsolt Szilagyi Oct 09 '20 at 09:31
0

You could try this multibyte trim function:

function mb_trim($str) {
  return preg_replace("/^\s+|\s+$/u", "", $str); 
}

No guarantee it will solve the problem, but it can't hurt.

I found it here: Multibyte trim in PHP?

KIKO Software
  • 15,283
  • 3
  • 18
  • 33
  • Thanks for your help, it did not work, seems not to be the case. See updated post. –  Oct 09 '20 at 10:33