0

I am reading in two files and comparing them and one of the things i realized i needed to consider is removing the spacing because it is causing a difference which i don't want spacing to be an aspect of the difference so i wanted to remove it.

This is what i have so far:

Dictionary<string, int> Comparer = new Dictionary<string, int>();
            string line;
            using (StreamReader sr = new StreamReader(openFileDialog1.FileName))
            {
                while (sr.Peek() >= 0 )
                {
                    line = sr.ReadLine();

                    if (Comparer.ContainsKey(line))
                        Comparer[line]++;
                    else
                        Comparer[line] = 1;
                }
            }

using (StreamReader sr = new StreamReader(openFileDialog2.FileName))
            {
                while (sr.Peek() >= 0)
                {
                    line = sr.ReadLine();
                    if (Comparer.ContainsKey(line))
                        Comparer[line]--;
                    else
                        Comparer[line] = -1;
                }
            }

            int mismatches = 0;

            var firstFileChanges = new List<string>();
            var secondFileChanges = new List<string>();

            System.Text.StringBuilder theStringBuilder = new System.Text.StringBuilder();
            foreach (KeyValuePair<string, int> kvp in Comparer)
            {
                if (kvp.Value != 0)
                {
                    mismatches++;
                    string InWhich = kvp.Value > 0 ? openFileDialog1.FileName : openFileDialog2.FileName;

                    if (InWhich == openFileDialog1.FileName)
                    {
                        firstFileChanges.Add(kvp.Key);
                    }
                    else
                    {
                        secondFileChanges.Add(kvp.Key);
                    }
               }
            }
            if (firstFileChanges.Count > 0)
            {
                theStringBuilder.Append("ADDED IN " + openFileDialog1.SafeFileName+": \n");

                int counter1 = 0;
                foreach (string row in firstFileChanges)
                {
                    if (counter1 > 0)
                    {
                        theStringBuilder.Append("\n ");
                    }
                    theStringBuilder.Append(row);
                    counter1 += 1;
                }
               theStringBuilder.AppendLine();
            }

            if (secondFileChanges.Count > 0)
            {
                theStringBuilder.Append("\nDELETED FROM "+openFileDialog2.SafeFileName+": \n");

                int counter2 = 0;
                foreach (string row in secondFileChanges)
                {
                    if (counter2 > 0)
                    {
                        theStringBuilder.Append("\n ");
                    }

                    theStringBuilder.Append(row);

                    counter2 += 1;
                }
            }

Example Input file: Name (spaaaaaaace) Title (spaaaaaaace) Status

I would like it to be : Name Title Status

Masriyah
  • 2,445
  • 11
  • 49
  • 91

4 Answers4

5

Just replace multiple white-spaces with a single white-space:

string cleanedLine = System.Text.RegularExpressions.Regex.Replace(line,@"\s+"," ");
if (Comparer.ContainsKey( cleanedLine ))
    Comparer[ cleanedLine ] ++;
else
    Comparer[ cleanedLine ] = 1;
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • This is a good solution, +1. But you're too quick at deleting your comments and therefore deserve -1 ;) – pbalaga Aug 29 '13 at 14:03
  • This worked but for some reason i wasn't getting the results i am expecting. i am looking into it but the solution below returned the results i was looking for. – Masriyah Aug 29 '13 at 14:07
2

Following would remove all white spaces (space, linebreak etc) from your string.

string NoWhiteSpaceString = new String(yourString
                                     .Where(r=> !char.IsWhiteSpace(r))
                                     .ToArray());

EDIT: For removing multiple spaces and replacing them by a single space you can try:

string yourString = "Name           Title           Status";
string NoWhiteSpaceString =
    string.Join(" ", 
            yourString.Split(new[]{' '}, StringSplitOptions.RemoveEmptyEntries));

Result would be:

NoWhiteSpaceString = "Name Title Status"
user2711965
  • 1,795
  • 2
  • 14
  • 34
  • @TimSchmelter: *constantly*? In what sense? It creates one array per call (and a list before that) - in the LINQ part at least. Another copy happens in the string .ctor. – pbalaga Aug 29 '13 at 13:53
  • @rook: i've deleted my comment because it was almost micro-optimization. However, since you have commented my own answer, here it is again. This creates a new (possibly long) `string`, creates a new lamdba every character in the string, enumerates all characters, creates a new `char[]` from a unknown size, so it [wastes memory](http://stackoverflow.com/a/16323412/284240). All this for every line. Last, it removes all white-spaces even if OP wanted `Name Title Status` as result. – Tim Schmelter Aug 29 '13 at 14:16
  • @TimSchmelter, just updated the answer according to OP's example as well. I am still not sure about the optimization. – user2711965 Aug 29 '13 at 14:18
  • @TimSchmelter: I agree in all apart from this: `creates a new lamdba every character in the string`. It certainly *calls* the lambda for every character, if this is what you mean. It is not an `Expression<>`, which would need `.Compile()` and cache approach. Also, I'm not sure what OP actually wants, given that initial version of this solution produced expected output for him. – pbalaga Aug 29 '13 at 14:33
  • @rook: You are possibly right with the lambda (where is J.Skeet?). However, as i remember there is a difference in: 1. `.Where(r=> char.IsWhiteSpace(r))` and 2. `.Where(char.IsWhiteSpace)`, the latter is more efficient. According to OP's requirement, who knows, but he mentioned that _"I would like it to be : Name Title Status"_ (contains single white-spaces). – Tim Schmelter Aug 29 '13 at 14:54
  • @TimSchmelter: of course, because `r=> char.IsWhiteSpace(r)` involves one more call, while `char.IsWhiteSpace` calls the actual working method directly. Both are passed as delegates and need to be called, but the option no. 1 has a redundant indirection. I wouldn't expect the compiler to optimize this away. – pbalaga Aug 29 '13 at 15:09
  • @rook: afaik because `char.IsWhiteSpace` is an existing delegate and `r=> char.IsWhiteSpace(r)` needs to create an anonymous function(expression). But i must admit that i'm not that familiar with these implementation details. – Tim Schmelter Aug 29 '13 at 15:11
1

well, if you have a string x, you can do

x.Trim();

while(x.Contains("  "))
{
   x.Replace("  ", " ");
}

that way the biggest space between words or sentences will be one whitespace

if you want to just remove every whitespace you can do

x.Replace(" ", "");
x.Replace("\t", "");

and that'll remove all whitespaces in your strings

No Idea For Name
  • 11,411
  • 10
  • 42
  • 70
  • `x.Replace(" ", "");` will not remove all whitespaces. Remember there are `\t`, `\r`, `\n` characters. – pbalaga Aug 29 '13 at 13:51
  • might be that you are correct on \t, but \r and \n? @rook look at his example and question, nothing is said on new line. edited the answer accordingly – No Idea For Name Aug 29 '13 at 14:00
1

This will replace all multiple whitespaces with only one.

string input = "Name      Title        Status";
string result = string.Join(" ", input.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries)); //result is "Name Title Status"
Pierre-Luc Pineault
  • 8,993
  • 6
  • 40
  • 55