71

Here is simple code

<?php

$var = "Бензин Офиси А.С. также производит все типы жира и смазок и их побочных        продуктов в его смесительных установках нефти машинного масла в Деринце, Измите, Алиага и Измире. У Компании есть 3 885 станций технического обслуживания, включая сжиженный газ (ЛПГ) станции под фирменным знаком Петрогаз, приблизительно 5 000 дилеров, двух смазочных смесительных установок, 12 терминалов, и 26 единиц поставки аэропорта.";

$foo = substr($var,0,142);

echo $foo;
?>

and it outputs something like this:

Бензин Офиси А.С. также производит все типы жира и смазок и их побочных продук�...

I tried mb_substr() with no luck. How to do this the right way?

PeeHaa
  • 71,436
  • 58
  • 190
  • 262
Nazar
  • 1,385
  • 4
  • 15
  • 18
  • 2
    `mb_substr()` is way to go, this happens when a multi-byte character gets cut in half. Can you show what you tried with that and how it failed? – Pekka Jan 31 '12 at 21:53
  • 3
    Did you specify the encoding (last parameter) when you tried `mb_substr`? – John Flatness Jan 31 '12 at 21:54
  • Thats exactly what I tried to do. I don't have it up on internet so I can't provide link. Its a long description of the company, which I cut to be 142 characters long to display on the home page of one website. – Nazar Jan 31 '12 at 21:56
  • @JohnFlatness No, I didn't scpecify, I just replaced substr() with mb_substr(). Let me check – Nazar Jan 31 '12 at 21:56
  • 1
    OK, thank you very much! I didn't specify the last argument of mb_substr() function which is "UTF-8" as @JohnFlatness noted. Now everything works great! Thak you very much guys! – Nazar Jan 31 '12 at 22:00

7 Answers7

134

The comments above are correct so long as you have mbstring enabled on your server.

$var = "Бензин Офиси А.С. также производит все типы жира и смазок и их побочных        продуктов в его смесительных установках нефти машинного масла в Деринце, Измите, Алиага и Измире. У Компании есть 3 885 станций технического обслуживания, включая сжиженный газ (ЛПГ) станции под фирменным знаком Петрогаз, приблизительно 5 000 дилеров, двух смазочных смесительных установок, 12 терминалов, и 26 единиц поставки аэропорта.";

$foo = mb_substr($var,0,142, "utf-8");

Here's the php docs:

http://php.net/manual/en/book.mbstring.php

Kai Qing
  • 18,793
  • 5
  • 39
  • 57
  • Thank you! The last argument that I missed was "UTF-8", tho I looked through documentation. – Nazar Jan 31 '12 at 22:02
6

A proper (logical) alternative for unicode strings;

<?php
function substr_unicode($str, $s, $l = null) {
    return join("", array_slice(
        preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}

$str = "Büyük";
$s = 0; // start from "0" (nth) char
$l = 3; // get "3" chars
echo substr($str, $s, $l) ."\n";    // Bü
echo mb_substr($str, $s, $l) ."\n"; // Bü
echo substr_unicode($str, $s, $l);  // Büy
?>

Use the PHP: mb_substr - Manual

Botir Ziyatov
  • 508
  • 1
  • 6
  • 20
6

If your strings may contain Unicode (multi-byte) characters and you don’t want to break these, replace substr with one of the following two, depending on what you want:

Limit to 142 characters:

mb_substr($var, 0, 142);

Limit to 142 bytes:

mb_strcut($var, 0, 142);
caw
  • 30,999
  • 61
  • 181
  • 291
4

PHP5 does not understand UTF-8 natively. It is proposed for PHP6, if it ever comes out.

Use the multibyte string functions to manipulate UTF-8 strings safely.

For instance, mb_substr() in your case.

thwd
  • 23,956
  • 8
  • 74
  • 108
  • It turns out they skipped PHP 6 and went straight to PHP 7. There is still no native unicode support. Perl has had it since at least Perl 5.6. – Anthony Rutledge Dec 08 '19 at 23:25
2

Never use constant in substr function for UTF-8 string:

$st = substr($text, $beg, 100);

50% chance you will get half of a character at end of the string.

Do it like this:

$postion_degin = strpos($text, $first_symbol);
$postion_end = strpos($text, $last_symbol);
$len = $postion_end - $postion_degin + 1;
$st = substr($text, $postion_degin, $len);

100% safe result.

No mb_substr.

ata
  • 3,398
  • 5
  • 20
  • 31
usergio
  • 21
  • 2
  • 1
    Cool if you know which chars you want to cut. If you want to have only, let's say the first 3 chars of a random string, it's no good. The correct way is with mb_substr. – Eir Jun 25 '17 at 19:15
2

If you want to use strlen function, to calculate length of string, which you want to return and your string $word has UTF-8 encoding, you have to use mb_strlen() function:

$foo = mb_substr($word, 0, mb_strlen($word)-1);

Guga Nemsitsveridze
  • 721
  • 2
  • 7
  • 27
0

I hope this solution help you as it helped me a lot.

<?php
if(mb_strlen($post->post_content,'UTF-8')>200){
    $content= str_replace('\n', '', mb_substr(strip_tags($post-> post_content), 
                          0, 200,'UTF-8'));
    echo $content.'…';
}else{
    echo str_replace('\n', '', strip_tags($post->post_content));
}
?>
trincot
  • 317,000
  • 35
  • 244
  • 286
Jodyshop
  • 656
  • 8
  • 12