0

I have string received from Html Agility Pack - it's cleaned and don't have tags:

string cleanText = htmlDoc.DocumentNode.InnerText;

Now my question is how to clean all chars like whitespace, new line etc? Example string:

                                                                                    @Vanni
                                            breitbart.com

                                            #swiat
                                            #usa
                                            #youtube
                                            #technologia
                                            +2 inne






                                    Akcja "They can't silence us" ma związek z pozwem wytoczonym przeciwko YouTube przez kanał PragerU za bezpodstawne zablokowanie konta.

I need string like that:

@Vannibreitbart.com#swiat#usa#youtube#technologia+2inneAkcja"Theycan'tsilenceus"mazwiązekzpozwemwytoczonymprzeciwkoYouTubeprzezkanałPragerUzabezpodstawnezablokowaniekonta.

MESSIAH
  • 1
  • 6

2 Answers2

0

Regex is probably as easy as any:

string compressed = Regex.Replace(bigstring, @"\s+", "");

You could also perhaps iterate it as a chararray and only load those chars where Char.IsWhitespace() returns false, into a string builder

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
  • 2
    Use `@"\s+"` and make it greedy. Also there's only a few hundred steps involved as your solution there's over a thousand steps; very inefficient. – Trevor Oct 25 '19 at 15:05
  • Also with the current string, your approach would be matching over 400 times, where as `@"\s+"` would match only 20 something, a significant change; fewer steps == fewer matches and better performance. – Trevor Oct 25 '19 at 15:11
-1

Use this to remove the chars, put the char on array to remove. pass the string to method. this a static mehtod

  public static string RemoveCharSpecials(string document)
    {
        var charsToRemove = new string[] { "@", ",", ".", ";", "'", "(", ")", "-", " ", "/" };

        try
        {
            if (!string.IsNullOrEmpty(document))
            {
                foreach (var c in charsToRemove)
                    document = document.Replace(c, string.Empty);
            }

            return document;
        }
        catch
        {
            return "";
        }
    }
CelzioBR
  • 126
  • 1
  • 10
  • 2
    Catching all exceptions, ignoring them, and then returning an empty string is pretty much the **worst** thing you can do here. I hope you don't actually write production code that does things like this... – Broots Waymb Oct 25 '19 at 15:07
  • 2
    Not to mention you're replacing characters OP wants to keep. Why have a blacklist of characters when they appear to just want to remove whitespace/newlines? – Code Stranger Oct 25 '19 at 15:08
  • @BrootsWaymb, in this case this try will never happen – CelzioBR Oct 25 '19 at 15:16
  • 2
    @CelzioBR - Doesn't mean it's not terrible coding practice. But yes, it shouldn't even be here anyway. – Broots Waymb Oct 25 '19 at 15:18
  • @CodeStranger this is to remove whatever character you want, it fits all – CelzioBR Oct 25 '19 at 15:19
  • @BrootsWaymb, why not?, it's works very well ;D – CelzioBR Oct 25 '19 at 15:20
  • 2
    @CelzioBR - Because it both serves no purpose and is terrible exception handling practice. Just because it doesn't cause problems in this case doesn't means it's good. It does not "work very well". – Broots Waymb Oct 25 '19 at 15:23