25

My requirement is:

I have to replace some special characters like * ' " , _ & # ^ @ with string.Empty, and I have to replace blank spaces with -.

This is my code:

 Charseparated = Charseparated
    .Replace("*","")
    .Replace("'","")
    .Replace("&","")
    .Replace("@","") ...

For so many characters to replace I have to use as many as Replace's which I want to avoid.

Is there another efficient way to remove the special characters, but at the same time replace blank spaces with -?

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
Venkat
  • 1,702
  • 2
  • 27
  • 47
  • Use a regular expression – Myrtle Jun 23 '17 at 07:23
  • 2
    Just want to add that if this is about generating a valid filename, you can get the set of invalid chars by using [Sytem.IO.Path.GetInvalidFileNameChars](https://msdn.microsoft.com/en-us/library/system.io.path.getinvalidfilenamechars(v=vs.110).aspx). – Georg Patscheider Jun 23 '17 at 07:34
  • 3
    Special how? These aren't special characters. Are you trying to clean up file paths? Read a CSV with quoted text? Sanitize SQL input? There are better alternatives for each case that don't require replacements – Panagiotis Kanavos Jun 23 '17 at 07:57
  • `StringBuilder.Replace()` is a more efficient alternative for `String.Replace()` as discussed [here](https://stackoverflow.com/questions/6524528/string-replace-vs-stringbuilder-replace). You'll still have to use many calls of `Replace()` though. – Seth Denburg Jun 23 '17 at 13:24
  • You have a problem. You solve it with a regular expression. Now you have two problems. (sorry could not help myself :) – Daniel Williams Dec 11 '20 at 20:25

14 Answers14

24

I believe, best is to use a regular expression here as below

s/[*'",_&#^@]/ /g

You can use Regex class for this purpose

Regex reg = new Regex("[*'\",_&#^@]");
str1 = reg.Replace(str1, string.Empty);

Regex reg1 = new Regex("[ ]");
str1 = reg.Replace(str1, "-");
Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
Rahul
  • 76,197
  • 13
  • 71
  • 125
10

Use Regular Expression

Regex.Replace("Hello*Hello'Hello&Hello@Hello Hello", @"[^0-9A-Za-z ,]", "").Replace(" ", "-")

It will replace all special characters with string.Empty and Space with "-"

Mitesh Gadhiya
  • 352
  • 2
  • 15
  • it is not typo - Space is there. as we want to replace space with "-" – Mitesh Gadhiya Jun 23 '17 at 13:14
  • First It is replacing all special characters with Empty string. At first replace using regx, space will be there and after it is replacing space with hyphen – Mitesh Gadhiya Jun 23 '17 at 13:17
  • Sorry, I clearly had a special moment. You are completely right and I will remove my comments (after giving enough time for you to read this). I clearly shouldn't be looking at SO right now. ;-) – Chris Jun 23 '17 at 13:18
8
Regex.Replace(source_string, @"[^\w\d]", "_");

This will replace all non-alphabets and non-numbers with '_' in the given string (source_string).

Yevhen Horbunkov
  • 14,965
  • 3
  • 20
  • 42
Susmita Kundu
  • 81
  • 1
  • 1
7

Make a collection of changes to make and iterate over it:

var replacements = new []
                   { new { Old = "*", New = string.Empty }
                   // all your other replacements, removed for brevity
                   , new { Old = " ", New = "-" }
                   }

foreach (var r in replacements)
{
    Charseparated = Charseparated.Replace(r.Old, r.New);
}
Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
  • It's essentially going to do the same what OP posted but yes less typing – Rahul Jun 23 '17 at 07:26
  • No, it is the exact same OP does, but then less repetitive. That was the entire goal right? – Patrick Hofman Jun 23 '17 at 07:26
  • This will result in a lot of temporary strings. A huge memory waste, if there's a lot of input data, and eat up a lot of CPU in garbage collection – Panagiotis Kanavos Jun 23 '17 at 07:59
  • @PanagiotisKanavos `charseperated` is not declared above it could be a `StringBuilder`, if it isn't it most probably should. – Scrobi Jun 23 '17 at 08:10
  • That would still be slower than a regex. N scans vs 1 – Panagiotis Kanavos Jun 23 '17 at 08:39
  • @PanagiotisKanavos - well, `Regex` (even compiled) is very powerful but horribly slow (by experience). In my tests (10000 randomly generated strings, each 1000 characters long with about half and half "special" charactes and non-special characters) this took almost only half the time than the `Regex` solution (`Regex`: `1.11` seconds; This: `0.54` seconds). And that includes reading the `string` value into a `StringBuilder` (`new StringBuilder(stringValue)`). – Corak Jun 23 '17 at 11:48
  • @Corak your experience is many orders of magnitude wrong. It also depends on *what* you tested - a one-on/one-off string is a bit ...contrived. It also depends on what you measured - did you include garbage collection time? If you want to get usable numbers use BenchmarkDotNet. You'll be horrified when you see the *real* numbers – Panagiotis Kanavos Jun 23 '17 at 11:54
  • @Corak also try a *realistic* example. Pick a large log file and try to parse it. – Panagiotis Kanavos Jun 23 '17 at 11:54
  • @PanagiotisKanavos - Unless OP gives us a whole bunch of values, no example is more *realistic* than the other. I layed out my test scenario. 10000 randomly generated strings, each 1000 characters with about half and half "special" charactes and non-special characters. Yes, I put `GC.Collect();GC.WaitForPendingFinalizers();` between tests, used `Stopwatch` and I ran it separately released with no debugger attached. Do you suggest a different string length and/or different special-to-non-special character ratio I should try? – Corak Jun 23 '17 at 12:11
  • @Corak no I suggest you use BenchmarkDotNET. And pick *realistic* scenario - not one where every other character matches. The fastest in this case would be just to scan the characters once, not perform N replacements. – Panagiotis Kanavos Jun 23 '17 at 12:11
  • @PanagiotisKanavos - Alright, will do. I'd suggest you do the same, so we can compare results. -- but again, what is "realistic"? 1 special in 10? 1 special in 100? – Corak Jun 23 '17 at 12:14
  • @corak panagiotis is right. You shouldn't use that code in a tight loop because of the reason he gave. It is a bad idea performance wise. – Patrick Hofman Jun 23 '17 at 13:14
  • @PanagiotisKanavos - Yes, you were right. Using 10 random strings of length 100000 with about 1 in 100 special characters, this took about 40 ms while the `Regex` solution took about 25 ms. – Corak Jun 23 '17 at 13:18
  • 4
    Unless we get a specific use case, every "performance measurement" is not meaningful. Performing measurements on 100k strings is irrelevant if the possible input sizes are e.g. below 20 chars on average. "Avoid premature optimization" means in this case: Make the code working and maintainable, and only make optimizations when you have measurements that indicate otherwise. – hoffmale Jun 23 '17 at 16:37
5

You can try using LINQ:

  var source = "lala * lalala @ whowrotethis # ohcomeon &";

  var result = string.Concat(source.Select(c => c == ' ' 
     ? "-" 
     : "*'\",_&#^@".Contains(c) ? "" 
     : c.ToString()));
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • `c.ToString()` ends up creating one temporary string for every input character. Better to treat only with chars and convert the result to a string at the end – Panagiotis Kanavos Jun 23 '17 at 12:22
  • A `Where(c=>"*'\",_^@".Contains(c)).Select(c => c == ' ' ? '-':c)` would result in a *single* scan of the input string too - `where` eliminates the unwanted characters, `Select` performs the replacement – Panagiotis Kanavos Jun 23 '17 at 12:24
  • @PanagiotisKanavos: Your `Where` condition needs a `!` in front of it (as it stands it returns all special characters instead of everything else). – hoffmale Jun 23 '17 at 18:41
4

The LINQ and char[] way:

   string f = Filter("*WHAT/ #PO#PO");

It returns WHAT-POPO:

    private string Filter(string s)
    {
        var chars = new[] { '*', '/', '#' };
        var filteredChars = s.ToArray();
        return new string(filteredChars
                 .Where(ch => !chars.Contains(ch) )
                 .Select(ch => ch == ' ' ? '-' : ch).ToArray());
    }
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Laurent Lequenne
  • 902
  • 5
  • 13
2

string.Replace is horrible, horrible, horrible and should not be used by professional programmers in anywhere but the most trivial of tasks.

Strings are immutable. This means that every time you do a string.replace (or a myString = myString + "lalala" and so on) the system needs to do all the logistics (creation of new pointer, copying of content, garbage collection etc). BTW, Patrick's answer above does just this (but with better code readability).

If this has to be done only a few times, it's not a problem --and the code is immediately readable.

But as soon as you put this operation in a loop, you need to write it in another way. I would go for regex myself:

string inStr = "lala * lalala @ whowrotethis # ohcomeon &";
string outStr = Regex.Replace(inStr , "[*|@|*|&]", string.Empty);
Jim Andrakakis
  • 165
  • 1
  • 2
  • 9
  • 3
    *string.Replace is horrible, horrible, horrible and should not be used by professional programmers in anywhere but the most trivial of tasks.*. That is just horrible. `string.Replace` must be used with care, but not using it doesn't solve problems. – Patrick Hofman Jun 23 '17 at 07:46
  • Agree (I did mention that using it in operations done a few times is fine) but still: if used in a loop, it has to changed. I have made this point when doing code reviews many times. IMO it is important, especially for junior colleagues, to understand what their code does behind the scenes. – Jim Andrakakis Jun 23 '17 at 07:49
  • 1
    @PatrickHofman the multiple temporary strings generated by repeated Replace calls are way beyond horrible. Try it on a log file and watch memory usage skyrocket to gigabytes. A regex can do the same job 10 times faster, with just 20MB simply because it *doesn't* generate useless temporary strings – Panagiotis Kanavos Jun 23 '17 at 08:01
  • 1
    `String.Replace` is not "horrible, horrible, horrible" and whether you should use it has nothing to do with whether you have programming as your profession or not, but rather with what you are trying to achieve. Additionally your regex has superflous `|` characters in it. – Matti Virkkunen Jun 23 '17 at 09:50
  • @MattiVirkkunen the regex was quickly written (I'm working!). I do maintain that "normal" string operations are horrible (read: memory and performance killers) for anything other that a few iterations --as Panagiotis explained. Sure, if it's one-off, fine. I said as much from the beginning. – Jim Andrakakis Jun 23 '17 at 10:15
  • 2
    @MattiVirkkunen the .NET Core team is spending a lot of time and effort into eliminating spurious allocations, even introducing a new UTF8String class. Multiple replacements *are* a bad idea no matter how you look at it. It's more about scale - 5 replacements isn't *that* much. 5*1000 rows is way too much. A single scan and filtering of the characters is better if all that's needed is to filter out characters – Panagiotis Kanavos Jun 23 '17 at 12:18
  • @PanagiotisKanavos: Yes, unnecessary allocations ought to be avoided, but this post starts by saying a method is horrible in general and should never ever be used which is not true. – Matti Virkkunen Jun 23 '17 at 13:16
  • @Matti "this post starts by saying a method is horrible in general" Yes, and this is by intention. I assumed that the person who asked is not a senior colleague. So my post had at least 50% of teaching intention. It's better to tell someone what to avoid and then give them the exceptions (if they overdo it, which is common, it will be on the "good but slightly less productive" side) than the other way around. That this is generally **not** done is exactly the reason people try to go through 5GB log files with `string.replace` . – Jim Andrakakis Jun 30 '17 at 08:34
2

Here is the most optimal and easy way to do that

    public void foo()
    {
        string input = "A sample input a*b#c@d";
        string unwanted = "*'\",_&#^@";
        List<char> unwantedChars = unwanted.ToList<char>();
        StringBuilder sb = new StringBuilder();

        input = input.Replace(' ', '-');
        foreach(char c in input)
        {
            if (!unwantedChars.Any(x => x == c))
                sb.Append(c);
        }
        string output = sb.ToString(); //A-sample-input-abcd
    }
Sunil Purushothaman
  • 8,435
  • 1
  • 22
  • 20
2

The OP asked for an "efficient" way to replace strings.

In terms of performance using Regex isn't the best solution (in case of readability or handiness it may be...).

Instead, StringBuilder preforms quite better, which may become important if you deal with large data.

 StringBuilder sb = new StringBuilder(myString);
 foreach (string unwanted in collectionOfUnwantedStrings)
         {
             sb.Replace(unwanted, string.Empty);
         }
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
BudBrot
  • 1,341
  • 2
  • 24
  • 44
1

Use string.Split with the array of separator chars, and aggregate them back in 1 string. Replacing with string.Empty and " " with "-" must be done separately, though.

        var res = "23#$36^45&".Split(new[] {'#', '$', '^', '&'}, StringSplitOptions.RemoveEmptyEntries)
            .Aggregate((s, s1) => s + s1);
        // "233645"
Stefan Balan
  • 592
  • 4
  • 21
  • 1
    `.Aggregate` is so unnatural here, why not `string.Concat("23#$36^45&".Split(...));` – Dmitry Bychenko Jun 23 '17 at 08:16
  • It's true... `string.Concat` is a better fit in this use case, when you just want those chars gone. But the scenarios I've used this pattern before, also required replacing, the aggregate was actually something like `(s, s1) => s+ "_" + s1` – Stefan Balan Jun 23 '17 at 08:26
1

You can use Regex like this.

string Charseparated = "test * -";

var replacements = new Dictionary<string, string>()
{
   {"*", string.Empty},
   {" ", "-"}
};

var reg = new Regex(String.Join("|", replacements.Keys.Select(k => Regex.Escape(k))));
var reg_replace = reg.Replace(Charseparated, m => replacements[m.Value]);
Joseph
  • 653
  • 1
  • 12
  • 28
0

pass string with special characters. you got only string without special characters. Note - add your special characters in "replaceables" list.

protected string hasSpecialChar(string input)
        {
            string[] replaceables = new[] { @"\", "|", "!", "#", "$", "%", "&", "/", "=", "?", "»", "«", "@", "£", "§", "€", "{", "}", "^", ";", "'", "<", ">", ",", "`" };
            string rxString = string.Join("|", replaceables.Select(s => Regex.Escape(s)));
            return Regex.Replace(input, rxString, "-");
        }
  • The question also specifies that blank spaces should be replaced with `-`, so you'd need to make a slight modification to your answer – Matt Oct 29 '18 at 06:10
0
String str = "Whatever???@@##$ is#$% in& here";
     
str = Regex.Replace(str,@"[^\w\d\s]","");
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
KaiOsmon
  • 19
  • 3
  • 1
    While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Yunnosch Aug 19 '22 at 18:28
  • 1
    You might also want to explain the advantage of your solution over existing ones on this page which are similar and better explained. – Yunnosch Aug 19 '22 at 18:28
-1

The same question was asked to me in an interview... i was given a string which was containing special characters and was asked to remove the special characters from the input string and reverse it then.

Here is the simple and efficient program:

public class Program
    {
        static void Main(string[] args)
        {
            string name = "T@T@P&!M#";

            Program obj = new Program();

            Console.WriteLine(obj.removeSpecialCharacters(name));

            Console.WriteLine(obj.reverseString(obj.removeSpecialCharacters(name)));

            Console.ReadLine();


        }

        private string removeSpecialCharacters(string input)
        {
            string[] specialCharacters = new string[] { "@", "&", "!", "#" };

            for (int i = 0; i < specialCharacters.Length; i++)
            {
                if (input.Contains(specialCharacters[i]))
                {
                    input = input.Replace(specialCharacters[i], "");
                }
            }

            return input;
        }

        private string reverseString(string input)
        {
            string reverseString = "";

            for (int i = input.Length - 1; i >= 0; i--)
            {
                reverseString = reverseString + input[i];
            }

            return reverseString;
        }
    }
Anshu Rj
  • 11
  • 2
  • See [Regex.Replace](https://msdn.microsoft.com/en-us/library/xwewhkd1(v=vs.110).aspx) for a much better answer. – jwdonahue Nov 12 '17 at 16:41