3

I have a program, it is an editor for Twitter tweets, it's counting the text to make it less than 280 character as twitter restriction.

I use for that utf8 property like this:

var str = "℞"
let r = str.utf8.count

The result = 3

This symbol (℞) and more like it takes only 2 character in twitter counter but the result in this code gave me 3, so i can't give the user the exact character count!

How can I get the correct count: 2

Nayef
  • 463
  • 5
  • 13

1 Answers1

3

Counting characters

Tweet length is measured by the number of codepoints in the NFC normalized version of the text.

In Swift, you can get the NFC normalized form through precomposedStringWithCanonicalMapping, and the number of codepoints by unicodeScalars.count.

So, the right code in Swift should be like this:

var str = "℞"
let r = str.precomposedStringWithCanonicalMapping.unicodeScalars.count
print(r) //->1

The code above shows consistent result with some character counters on the web, I do not understand why you get 2 for .


(Thanks to Rakesha Shastri.) I believe the code above correctly implements the specification described in the documentation I linked above.

But it is reported that the actual Twitter does not work exactly as in the doc. (Sorry, I do not tweet myself.) We may need to guess or find another reliable source to make it fit for the actual Twitter.


I tried the official library text Tweet parsing library, but it shows the same result as my code.

let len = TwitterText.tweetLength(str)
print(len) //->1

(Though, the code of TwitterText.tweetLength(_:) is far more complex, as it handles t.co links. So, when some URLs are included in the text, it generates different output than my code.)


(UPDATE)

I'm not sure as the referred twitter apps are not open-source, but I guess they are showing the weighted length described in the text Tweet parsing library page linked above.

You may need to write something like this with importing the library using pod.

let config = TwitterTextConfiguration(fromJSONResource: kTwitterTextParserConfigurationV2)
let parser = TwitterTextParser(configuration: config)
let result = parser.parseTweet(str)
print(result.weightedLength) //->2
OOPer
  • 47,149
  • 6
  • 107
  • 142
  • I read the same document and thought this was the solution. But if you go to twitter and type the character, it counts it as 2, not 1. – Rakesha Shastri Oct 07 '18 at 13:43
  • @RakeshaShastri, thanks. That may happen some actual implementations do more than documented. I add note on it. – OOPer Oct 07 '18 at 13:46
  • This symbol take 2 characters in Twitter App in iphone also in Tweetbot App !! – Nayef Oct 07 '18 at 20:45
  • @user2713544, thank you. I'm wandering around on the web, since Rakesha Shastri told me actual Twitter shows 2 for your example. But sorry, I could not found any clue till now. – OOPer Oct 07 '18 at 20:49
  • If you are really sure you have included the framework into **Linked Libraries and Frameworks** and got that error, I cannot tell you how to fix it. You may need to include all source files and resource files into your project. – OOPer Oct 10 '18 at 07:48
  • Thank you I used the library linked above and it is solved my problem – Nayef Oct 10 '18 at 08:18
  • You're right that Twitter's docs are wrong. They claim to count *Unicode code points*, but really count *UTF-16 code units*. (This is the way that string lengths are measured in JavaScript and C#, by the way; I suspect that on their backend they're measuring string lengths in a language that measures string lengths in UTF-16 code units and whoever wrote the docs has no idea that these aren't the same thing as Unicode code points.) I'm not a Swift dev so not sure how best to measure a string's UTF-16 code unit length in Swift; I'll leave that bit to you. – Mark Amery Dec 16 '18 at 16:53
  • 1
    BTW, since there doesn't seem to be an official way to report bugs that I can find, I signed up for a Twitter account and Tweeted at @TwitterSupport about this docs bug: https://twitter.com/XplodingCabbage/status/1074355570512142336. We'll see if anything comes of it. – Mark Amery Dec 16 '18 at 17:32