Does this Dot '•' count as a valid UTF8 character or Not?

Question

In this Text, does the Dot '•' count as a valid UTF8 Character, even though it takes 3 bytes unlike the other characters which are single-byte each?

ABCDEFGHIJ•XYZ

Every Unicode character is a valid UTF-8 character. In UTF-8 encoding a Unicode character can be between 1 and 4 bytes long. — ckuri, Feb 13 '20 at 12:10
Thanx for pointing out. But is there a calcultion like if the value is between so and so and so and so then those are utf8 characters — BeeGees, Feb 13 '20 at 12:18
@BeeGees The [Wikipedia page for UTF-8](https://en.wikipedia.org/wiki/UTF-8) has a good description of how it works. Code points between U+0000 and U+007F use a single byte. Those between U+0080 and U+07FF use two bytes. And so on — canton7, Feb 13 '20 at 12:21
Also with regards to your [previous question](https://stackoverflow.com/questions/60202410/why-is-streamreader-and-sr-basestream-seek-giving-junk-characters-even-in-utf8), UTF-8 is a [self-synchronizing code](https://en.wikipedia.org/wiki/Self-synchronizing_code#Note), and you can check if a byte `b` is the start of an UTF-8 character if `b & 0b1000_0000 == 0 || b & 0b1100_0000 == 0b1100_0000`. — ckuri, Feb 13 '20 at 12:21
Can i get a link to some working C# code which checks very fast whether a text is all valid UTF8 characters — BeeGees, Feb 13 '20 at 12:37
@BeeGees `new UTF8Encoding(false, true).GetString(bytes);` throws an exception if `bytes` is not valid UTF-8 — canton7, Feb 13 '20 at 12:45
For knowledge What does encoderShouldEmitUTF8Identifier = true means (which you have said false here) — BeeGees, Feb 13 '20 at 13:01
@BeeGees Please read [the docs](https://learn.microsoft.com/en-us/dotnet/api/system.text.utf8encoding.-ctor?view=netframework-4.8#System_Text_UTF8Encoding__ctor_System_Boolean_System_Boolean_) — canton7, Feb 13 '20 at 13:18
@canton7 I know what it does. Point is why would you want to emit the UTF8 BOM for practical purpose. Where is it used practically — BeeGees, Feb 13 '20 at 13:31
@BeeGees E.g. so that a text editor reading a file knows what encoding the file is in — canton7, Feb 13 '20 at 13:33
@BeeGees As an example, if the UTF-8 BOM is missing from a text file, then for backward compatibility Microsoft Excel (and many other Windows programs) will assume that file is encoded in localized ANSI encoding instead of UTF-8. — Mark Tolonen, Feb 13 '20 at 17:14

score 1 · Accepted Answer · answered Feb 13 '20 at 12:10

1

Why not? MESSAGE WAITING (U+0095)

http://www.fileformat.info/info/charset/UTF-8/list.htm

answered Feb 13 '20 at 12:10

Vitalij Draba

26
2

1

Thanx for the extensive list. Regards. But is there a calcultion like if the value is between so and so and so and so then those are utf8 characters – BeeGees Feb 13 '20 at 12:17

Does this Dot '•' count as a valid UTF8 character or Not?

1 Answers1