6

Here's my working code:

string Input;
string Output;

Input = data;
Output = Input.Replace(@")", "");

Here, I am simply removing the parentheses ")" from my string, if it exists. Now how do I expand the list of offending characters like ")" to include "(" and "-" as well?

I realize I can write 2 more Output-like statements, but I'm wondering if there is a better way...

Bob.
  • 3,894
  • 4
  • 44
  • 76
MrPatterns
  • 4,184
  • 27
  • 65
  • 85
  • 2
    Same question: http://stackoverflow.com/questions/7411438/remove-characters-from-c-sharp-string – Rudis Oct 09 '13 at 19:15

8 Answers8

12

If you're just doing a couple replacements (I see you're only doing three), the easiest way without worrying about Regex or StringBuilders is to chain three Replace calls into one statement:

Output = Input.Replace("(", "").Replace(")", "").Replace("-", "");

... which is marginally better than storing the result in Output every time.

Kevin T
  • 216
  • 1
  • 6
11
Output = Regex.Replace(Input, "[()-]", "");

The [] characters in the expression create a character class. It doesn't match those character directly.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
8

LINQ solution:

Output = new String(Input.Except("()-").ToArray());
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 12
    `Except` returns a distinct set of characters, so the output is incorrect (eg. `Test` will become `Tes`), you can look this up on [MSDN](https://msdn.microsoft.com/en-us/library/vstudio/bb300779.aspx). – Ronald Feb 12 '15 at 13:38
  • @Ronald: Your underlying claim needs to be tested, but the specifics you claim are both false. `"Test"` has no repeat elements. And the linked MSDN page does not anywhere say that elements in `input1` are omitted if they have already been output. – Ben Voigt Feb 12 '15 at 15:00
  • 2
    @Ronald: You are correct that repetitions will be removed... that behavior is, however, not documented. I had to look at the implementation of `ExceptIterator`. – Ben Voigt Feb 12 '15 at 15:02
  • 2
    @BenVoigt You are right about my example, letter casing does matter, so: `test` will become `tes`. The MSDN documentation isn't clear indeed, only the community additions (comments) for .NET Framework 4 show this (at the time of writing). – Ronald Feb 17 '15 at 08:28
  • 1
    This doesn't seem to work, e.g. [dbo].[Temp_CSVLoad] using "[]" as the except parameter produces dbo.Temp_CSVLa – Neil Walker Aug 22 '18 at 16:16
  • @NeilWalker: Yes, please file a documentation bug, because the actual behavior of Except (as discussed in the comments) is not described in the documentation. – Ben Voigt Aug 22 '18 at 17:01
  • It actually _is_ documented: _The_ set _difference of two_ sets _is defined as the members of the first_ set _that don't appear in the second_ set. (https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.except) – Stephan Nov 06 '21 at 21:00
  • @Stephan: That documentation is incomplete. Of particular importance is the unstated "and have not appeared earlier in the first sequence" – Ben Voigt Nov 08 '21 at 15:47
  • It's boiling down on the definition of a _set_ vs a collection or a sequence: A set can only contain distinct elements. However, one could submit a PR to the documentation to put more emphasize on that. – Stephan Nov 09 '21 at 16:09
  • @Stephan: Indeed, the word *set* implies distinctness.... but LINQ `Except` doesn't actually take sets as inputs, the argument types are all collections (sequences). So the behavior of sequences with repeated elements needs to be documented, which would not be the case if the inputs were e.g. `HashSet` where repeats were simply impossible. – Ben Voigt Nov 09 '21 at 16:45
  • It's documented now: "This method returns those elements in first that don't appear in second. ... Only unique elements are returned." this answer is misleading. – DasKrümelmonster Jul 26 '23 at 05:24
4

As an alternative to Regex, it may be easier to manage as a collection of replacements and doing the replaces using a StringBuilder.

var replacements = new[] { ")", "-" };
var output = new StringBuilder(Input);
foreach (var r in replacements)
    output.Replace(r, string.Empty);
Khan
  • 17,904
  • 5
  • 47
  • 59
3

You can use Regex.Replace(), documented here.

Software Engineer
  • 3,906
  • 1
  • 26
  • 35
1

You can use a List which contains your badwords. Now just use a foreach loop to iterate over it and replace every bad string.

StringBuilder output = new StringBuilder("(Hello) W,o.r;ld");
List<string> badwords = new List<string>();
badwords.Add("(");
badwords.Add(")");
badwords.Add(",");
badwords.Add(".");
badwords.Add(";");
badwords.ForEach(bad => output = output.Replace(bad, String.Empty));
//Result "Hello World"

Kind regards.

//Edit: Implemented changes suggested by Khan.

Marco
  • 22,856
  • 9
  • 75
  • 124
  • 1
    You should change `source` to a StringBuilder or you're going to be creating a new string in memory for each *badword*. – Khan Oct 09 '13 at 19:48
  • Good point, Thank you. I have implemented the changes. – Marco Oct 09 '13 at 19:57
1

This will allow you to do same thing also

    private static string ReplaceBadWords(string[] BadStrings, string input)
    {
        StringBuilder sb = new StringBuilder(input);
        BadStrings.ToList().ForEach(b => 
        {
            if(b != "") 
            {
                sb = sb.Replace(b, string.Empty);
            }
        });

        return sb.ToString();
    }

Sample usage would be

        string[] BadStrings = new string[]
        {
            ")",
            "(",
            "random",
            ""
        };

        string input = "Some random text()";
        string output = ReplaceBadWords(BadStrings, input);
Brandon Johnson
  • 196
  • 2
  • 3
0

I'd probably use a regular expression as it's terse and to the point. If you're scared of regular expression, you can teach the computer to write them for you. Here's a simple class for cleaning strings: you just provide it with a list of invalid characters:

class StringCleaner
{
  private Regex regex ;

  public StringCleaner( string invalidChars ) : this ( (IEnumerable<char>) invalidChars )
  {
    return ;
  }
  public StringCleaner ( params char[] invalidChars ) : this( (IEnumerable<char>) invalidChars )
  {
    return ;
  }
  public StringCleaner( IEnumerable<char> invalidChars )
  {
    const string    HEX     = "0123456789ABCDEF" ;
    SortedSet<char> charSet = new SortedSet<char>( invalidChars ) ;
    StringBuilder   sb      = new StringBuilder( 2 + 6*charset.Count ) ;

    sb.Append('[') ;
    foreach ( ushort c in charSet )
    {
      sb.Append(@"\u" )
        .Append( HEX[ ( c >> 12 ) & 0x000F ] )
        .Append( HEX[ ( c >>  8 ) & 0x000F ] )
        .Append( HEX[ ( c >>  4 ) & 0x000F ] )
        .Append( HEX[ ( c >>  0 ) & 0x000F ] )
        ;
    }
    sb.Append(']') ;
    this.regex = new Regex( sb.ToString() ) ;
  }

  public string Clean( string s )
  {
    if ( string.IsNullOrEmpty(s) ) return s ;
    string value = this.regex.Replace(s,"") ;
    return value ;
  }

}

Once you have that, it's easy:

static void Main(string[] args)
{
  StringCleaner cleaner = new StringCleaner( "aeiou" ) ;
  string dirty = "The quick brown fox jumped over the lazy dog." ;
  string clean = cleaner.Clean(dirty) ;
  Console.WriteLine( clean ) ;
  return;
}

At the end of which clean is Th qck brwn fx jmpd vr th lzy dg.

Easy!

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135