I have a string with some wonky characters (for example) " "
. I need to check if a List contains the first item in the string. But if I index it, it always becomes \ud835
. After using Char.ConvertFromUtf32(\ud835
) and some other attempts, I simply can't find out how to get the first item as a "".
Asked
Active
Viewed 1,294 times
2

Adam Dernis
- 530
- 3
- 14
-
I'm not following. `'\Ud835'` is a "high surrogate" and not a valid character by itself. Is your string "Lead Backend" and rendered in a wonky font, or is that lead character really wonky and represented by a Unicode surrogate pair? – Flydog57 Aug 10 '18 at 23:14
-
@Flydog57 it's represented by a Unicode surrogate pair – Adam Dernis Aug 10 '18 at 23:15
-
1This might help: https://stackoverflow.com/questions/14347799/how-do-i-create-a-string-with-a-surrogate-pair-inside-of-it. Otherwise, search around for stuff on "surrogate pairs". I've never had to play with them – Flydog57 Aug 10 '18 at 23:20
-
What is the "first item from the string"? ? ? – Jacob Krall Aug 11 '18 at 00:04
-
@JacobKrall "" is the first item – Adam Dernis Aug 11 '18 at 00:07
1 Answers
4
is represented with a surrogate pair in UTF-16, the encoding used by .NET.
A surrogate pair is represented with two characters:
var s = " ";
Console.WriteLine(new string(new[] { s[0], s[1] }) == "");
There are built-in helper methods like Char.ConvertToUtf32
and Char.IsSurrogate
which you can use to figure out if you are in this situation.

Jacob Krall
- 28,341
- 6
- 66
- 76
-
-
-
sorry I'd missed that, but how do I get a "" back after indexing for the first value. "\ud835" != "" so that doesn't work – Adam Dernis Aug 11 '18 at 00:47
-
1Please read my answer. ***Two*** indexes are required to get "" back. – Jacob Krall Aug 11 '18 at 00:48
-
Yep, thank you. I just looked at .Count() of the string " " and understand now. thanks again – Adam Dernis Aug 11 '18 at 00:49
-
@AvishaiDernis You can use things like `var first = StringInfo.GetNextTextElement(" ");` or do `var info = new StringInfo(" "); var first = info.SubstringByTextElements(0, 1);` to get out the entire first Unicode character `""` and not just half of the surrogate pair. – Jeppe Stig Nielsen Mar 05 '19 at 07:28