Given a Unicode string and these requirements:
- The string be encoded into some byte-sequence format (e.g. UTF-8 or JSON unicode escape)
- The encoded string has a maximum length
For example, the iPhone push service requires JSON encoding with a maximum total packet size of 256 bytes.
What is the best way to truncate the string so that it re-encodes to valid Unicode and that it displays reasonably correctly?
(Human language comprehension is not necessary—the truncated version can look odd e.g. for an orphaned combining character or a Thai vowel, just as long as the software doesn't crash when handling the data.)
See Also:
- Related Java question: How do I truncate a java string to fit in a given number of bytes, once UTF-8 encoded?
- Related Javascript question: Using JavaScript to truncate text to a certain size