2

Our PHP page was just a UTF-8 webpage consisting of Chinese characters in the meta descriptions.

I don't know why when someone tried to share the links into Whatsapp, it showed broken letters.

But I shared it to find it non-broken (normal).

What are the possible reasons behind it? We added both:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

and

header('Content-Type: text/html; charset=UTF-8');

Someone has any clue? Thanks!

==========

enter image description here

  • duplicate of http://stackoverflow.com/questions/279170/utf-8-all-the-way-through – Iłya Bursov May 16 '17 at 17:42
  • I don't know why but [this says](https://richpreview.com/?url=http%3A%2F%2Fentrepreneur-times.com%2Fl%2Ftch%2Fblog%2F%3Fid%3D12) your meta description tag is not found. Perhaps it's incorrectly generated because of UTF-* issues? – sgr12 May 16 '17 at 18:08
  • Change your charset, doc level will hopefully do the trick, might have to look farther up... Wait, wait... Language attribute! would that help? – admcfajn May 21 '17 at 05:53

1 Answers1

1

The software in use (let's say blogging software) does not handle the UTF-8 encoded content well which results in non-UTF8 output to the Browser.

It's not that the blogging software would be flawed in all content operations, more the opposite is the case, it is flawed only in some content operations. But it happens on each page I've seen there and is enough to make a simple UTF-8 check fail:

$ curl -s 'http://entrepreneur-times.com/l/tch/blog/?id=12' \
  | php -r 'var_dump(preg_match("~~u", file_get_contents("php://stdin")));'
bool(false)

The problem is the generation of description texts (HTML meta tags for description and og:description). That part of the software does not take the content Unicode UTF-8 encoding into account and just cuts off at some binary length (most likely, I haven't seen the code). That way of cutting breaks the UTF-8 output.

The fix is here to remove the flaw from the software.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • @Vanson Wing Leung: Stackoverflow example is here: https://stackoverflow.com/a/9087570/367456 - And if you need more control check [PHPs *intl* library](http://php.net/manual/en/book.intl.php). – hakre May 23 '17 at 15:59
  • Fixed, thanks! Moral: Always use mb_substr instead of substr when trying to trim down strings – Vanson Wing Leung May 28 '17 at 14:35
  • @VansonWingLeung: Here is a DOMText based variant: https://3v4l.org/YOvKK#v500 - Just to show there are more than one way to achieve this. The XML extension is often available. – hakre May 29 '17 at 19:44