14

My program is a file validation utility. I have to read in a format file and then parse out each line by a single space. But obviously, the person who wrote the format file may use tabs, or two spaces, or any form of whitespace, and I'm looking for some code to do that. I've tried this:

public static string RemoveWhitespace(this string line)
{
    try
    {
        return new Regex(@"\s*").Replace(line, " ");
    }
    catch (Exception)
    {
        return line;
    }
}

I'm assuming this is wrong. What should I do?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
New Start
  • 1,401
  • 5
  • 21
  • 29
  • If you are going to do this lots of times you might want to store the construct the Regex object beforehand. A private static readonly would do. Then you avoid the creation of the regex engine everytime you are going to replace a line. – Skurmedel Sep 16 '10 at 09:57
  • @Skurmedel: Or just use the built-in static `Regex.Replace` method. – LukeH Sep 16 '10 at 10:00
  • @LukeH: I think you missed my point. There's a reason why you can preconstruct the regex objects, and not only for easy reusability. See here http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx "Static vs Instance methods". Granted, if he/she doesn't use 15 different regexes in his application there are probably no problem. But I don't know his/her application. That's why I said "might". – Skurmedel Sep 16 '10 at 10:15
  • 1
    The static method caches a limited number of the most recently used patterns, so if you know that your application isn't doing anything else with Regex between calls then it's optimal. If you're doing this in a library, you might prefer your own static instance to be sure that the consuming application doesn't accidentally cause cache misses. – stevemegson Sep 16 '10 at 10:20
  • Possible duplicate: *[How do I replace multiple spaces with a single space in C#?](https://stackoverflow.com/questions/206717/how-do-i-replace-multiple-spaces-with-a-single-space-in-c)* – Peter Mortensen Feb 18 '22 at 20:13

2 Answers2

37

You can do this -

System.Text.RegularExpressions.Regex.Replace(str,@"\s+"," ");

where str is your string.

Sachin Shanbhag
  • 54,530
  • 11
  • 89
  • 103
  • I really want to accept this as my answer but it just doesn't seem to work. It just keeps throwing an exception. Also, just a general question; in regards to Regex, does '\s' just mean whitespace? – New Start Sep 16 '10 at 10:05
  • @New Start - Can you tell me what the error is? I hope you are using proper namespace right? – Sachin Shanbhag Sep 16 '10 at 10:08
  • @New Start - '\s' matches white space character. check this - http://www.regular-expressions.info/charclass.html#shorthand – Sachin Shanbhag Sep 16 '10 at 10:16
  • @New Start - I have tried this on my end. it Works fine. If you can tell what is your error, I can help you with that. – Sachin Shanbhag Sep 16 '10 at 10:45
  • I was using proper namespace, yes! My problem was I was returning the original line instead of the edited line. Thank you for your help! – New Start Sep 16 '10 at 10:52
0
input = input.Replace("\t", " ");

List<string> empties = new List<string>();
for (int i=input.Length - 1; i>1; i--)
{
    string spcs = "";
    for (int j=0; j<=i; j++)
        spcs += " ";
    if (input.Contains(spcs))
        empties.Add(spcs);
}

foreach (string s in empties)
    input = input.Replace(s, " ");
gamesguru
  • 92
  • 1
  • 3
  • 6