6

I want to count tweet length like twitter, I try using mb_strlen and strlen all these type here

The problem is twitter count "✌️ @mention" as 15, But I get these result and I don't know how twitter count emoji and how to approach this with php

My result:

strlen: 27
mb_strlen UTF-8: 14
mb_strlen UTF-16: 13
iconv UTF-16: 14
iconv UTF-16: 27
Ali Akbar Azizi
  • 3,272
  • 3
  • 25
  • 44

2 Answers2

4

From Twitter's developer documentation:

For programmers with experience in Unicode processing the short answer to the question is that Tweet length is measured by the number of codepoints in the NFC normalized version of the text.

So to calculate the length of a tweet in PHP, you would first normalize the text using Normalization Form C (NFC) and then count the number of codepoints (NOT CHARACTERS) in the normalized text.

$text = "✌️ @mention";

// Get the normalized text in UTF-8
$NormalizedText = Normalizer::normalize($text, Normalizer::FORM_C );

// Now we can calculate the number of codepoints in this normalized text
$it = IntlBreakIterator::createCodePointInstance();
$it->setText($NormalizedText);

$len = 0;
foreach ($it as $codePoint) {
    $len++;
}

echo "Length = $len"; // Result: Length = 15
Sherif
  • 11,786
  • 3
  • 32
  • 57
  • Thanks, I use normalization but counting characters, thanks again It's work – Ali Akbar Azizi Jan 26 '20 at 08:30
  • 1
    Close, but off by 1. The iterator returns the breakpoints, starting with 0, which is to the left of the first character. So if given "ABC", it would return 0, 1, 2, 3. You can remove `$len` and the length is simply the last `$codepoint` returned. "Traversing an IntlBreakIterator yields non-negative integer values representing the successive locations of the text boundaries, expressed as UTF-8 code units (byte) counts, taken from the beginning of the text (which has the location 0). The keys yielded by the iterator simply form the sequence of natural numbers {0, 1, 2, …}. " – Dan Chadwick Apr 04 '21 at 02:41
  • 1
    This is wildly incorrect. According to the Twitter documentation you link to "emojis always count as two characters, regardless of combining modifiers". You code takes no account of this. An example they give is ‍‍‍ having a length of 2. Your method says 8. – WebSmithery Jun 03 '21 at 09:34
  • You're not wrong. They do indeed ig ore the conjoining glyphs. Feel free to make an edit if you have a better solution for this. – Sherif Jul 07 '21 at 18:23
0

@Sherif Answer is not working in some cases. I found this library that work perfectly nojimage/twitter-text-php

here is my code

use Twitter\Text\Parser;

$validator = Parser::create()->parseTweet($caption);
        
if ($validator->weightedLength > 280) {
    throw new MessageException("Maximum post length is 280 characters.");
}
Ali Akbar Azizi
  • 3,272
  • 3
  • 25
  • 44