1

string x = "hello ​"; Console.WriteLine("\"" + x.Trim() + "\"");

Output: "hello ​" I want the output: "hello​"

How to deal with such symbols? This is the symbol U+200B

Rand Random
  • 7,300
  • 10
  • 40
  • 88
MaKeSter
  • 73
  • 7
  • 4
    [U+8203](https://www.compart.com/en/unicode/U+8203) is not whitespace. Are you sure that's the character you mean? Perhaps you meant [U+200B](https://unicode-explorer.com/c/200B)? – Jon Skeet May 31 '23 at 09:03
  • 2
    To remove all whitespace characters, including the non-breaking space, you can use the Replace() method in combination with regular expressions. – Amul Bhatia May 31 '23 at 09:05
  • @JonSkeet I'm talking about this symbol https://symbl.cc/en/200B/ – MaKeSter May 31 '23 at 09:05
  • 1
    I think you meant U+200B, which is Zero Width Space. Not whitespace according to `char.IsWhiteSpace` https://dotnetfiddle.net/Zo4Le9 also https://www.compart.com/en/unicode/U+200B shows the category as Format not Space Separator – Charlieface May 31 '23 at 09:06
  • 4
    One of the characters *specifically mentioned* in the documentation about [`Trim`](https://learn.microsoft.com/en-us/dotnet/api/system.string.trim?view=net-7.0)? – Damien_The_Unbeliever May 31 '23 at 09:09
  • 1
    If you want to trim extra, non-whitespace characters (U+200B isn't) you can use the `Trim(Char[])` overload and specify the characters you want – Panagiotis Kanavos May 31 '23 at 09:12
  • Given that you've removed the claim that `char.IsWhiteSpace`, you might want to change the title... you certainly haven't given evidence that `Trim()` doesn't remove all whitespace characters. – Jon Skeet May 31 '23 at 09:43
  • @JonSkeet - evidence would be user eyesight it is a whitespace to the human eye, the technical specification doesn't reflect this – Rand Random May 31 '23 at 09:45
  • @RandRandom: I still think it's a misleading title, stemming from the original incorrect assertion. It gives the impression that `Trim()` isn't obeying its contract, which simply isn't the case. – Jon Skeet May 31 '23 at 09:46
  • 1
    @RandRandom - For a "zero width space" the human eye doesn't detect any space at all of any colour – Martin Smith May 31 '23 at 10:43
  • @MartinSmith - but this is bascially what OP is looking for remove everything non human readable, there for OPs new question: https://stackoverflow.com/questions/76372565/get-the-number-of-visible-characters-in-a-string – Rand Random May 31 '23 at 10:45
  • @JonSkeet - just in case you are interessted (see OPs new question) – Rand Random May 31 '23 at 10:46

2 Answers2

2
public static class StringExtension
{
    private readonly static string regExp = "((?=^)[\\s\\u200b]*)|([\\s\\u200b]*(?=$))";

    public static string TrimZSC(this string s)
        => Regex.Replace(s, regExp, "");
}
var x = " ​he  ​llo ​"; 
Console.WriteLine(x);
Console.WriteLine(x.TrimZSC());

It will trim whitespace and 'Zero Space Character' at the end and at the beginning of the given string. Characters in between will remain (including zsc) - as they should (in the trim function).

1

You can use Regex to write your own trim extension. For example, this trims all mark, separator and other categories at the beginning and end. RegexOptions.Singleline should make it behave like standard Trim.

I've split it into two expressions because a single one would have O(N^2) worst-case runtime. With GeneratedRegex, the performance shouldn't be terrible either.

internal static partial class MyTrimExtensions
{
    public static string TrimMarksSeparatorsOthers(this string toTrim)
    {
        var toTrimFront = TrimMarksSeparatorsOthersFrontRegex().Match(toTrim);
        if (toTrimFront.Length == toTrim.Length)
            return "";

        var toTrimBack = TrimMarksSeparatorsOthersBackRegex().Match(toTrim);
        return toTrim[toTrimFront.Length..toTrimBack.Index];
    }

    [GeneratedRegex("^[\\p{M}\\p{Z}\\p{C}]*", RegexOptions.Singleline)]
    private static partial Regex TrimMarksSeparatorsOthersFrontRegex();
    [GeneratedRegex("[\\p{M}\\p{Z}\\p{C}]*$", RegexOptions.Singleline | RegexOptions.RightToLeft)]
    private static partial Regex TrimMarksSeparatorsOthersBackRegex();
}
relatively_random
  • 4,505
  • 1
  • 26
  • 48