0

I want to trim full byte spaces from before and after strings. It can contain both Japanese and/or English letters. However, it is not working perfectly for strings starting with hiragana and katakana.

//test1
$text = "  romaji  ";
var_dump(trim($text," ")); // returns "romaji"

//test2 
$text = "  ひらがな  ";
var_dump(trim($text," ")); // returns "��らがな"

//test3
$text = "  カタカナ  ";
var_dump(trim($text," ")); // returns "��タカナ"

//test4 
$text = "  漢字  ";
var_dump(trim($text," ")); // returns "漢字"

Why is the code not working for test 2 and 3? How can we solve it?

Nabil Farhan
  • 1,444
  • 3
  • 25
  • 41
  • 2
    Maybe this answer https://stackoverflow.com/a/10067670/4734154 ist helpful. Php trim and UTF-8 may be tricky. – Gerald Zehetner Nov 30 '20 at 17:01
  • 2
    Does this answer your question? [Multibyte trim in PHP?](https://stackoverflow.com/questions/10066647/multibyte-trim-in-php) ... solution is there. Non-UTF-8 aware PHP functions are known to potentially mangle multibyte characters. For the record, the "spaces" in your tests are `U+3000 : IDEOGRAPHIC SPACE`. – Markus AO Nov 30 '20 at 17:24

1 Answers1

1

This is hard to troubleshoot, more detailed described here

  1. PHP output showing little black diamonds with a question mark

  2. Trouble with UTF-8 characters; what I see is not what I stored

For overcome this you can use str_replace. replace all spaces with nothing in string. This will remove all spaces. Not recommended in sentences as it remove all spaces. Good for words.

$text = "  ひらがな  ";
$new_str = str_replace(' ', '', $text);
echo $new_str;    // returns ひらがな

If you want to remove spaces in beginning and ending use regex with preg_replace

print preg_replace( '/^s+|s+$/', '', "    ひらがな ひらがな" ); //return ひらがな ひらがな

trim is actually nine times faster. But you can use it.
Check speed comparison here.

https://stackoverflow.com/a/4787238/10915534

Ahmed Ali
  • 1,908
  • 14
  • 28