0

I need to compare two txt files (file1 is used as baseline, and file2 is the one needs to compare with file1), I want to find out the differences to the file3, including missing lines, extra lines, and the lines with different content.

For my current code, if 2nd line in file2 is missing, all of following lines in file2 will be written into file3. How can skip the missing line in this case and only find out exactly different lines? Any ideas for this?

int file1LineNo = 0;
int file2LineNo = 0;
string file1lineStr;
string file2Str;
SortedDictionary<int, Object[]> info = new SortedDictionary<int, Object[]>();
string[] file1Lines = File.ReadAllLines(file1Name);
string[] file2Lines = File.ReadAllLines(file2Name);
while (file1LineNo<file1Lines.length)
{
  file1lineStr = file1Lines[file1LineNo];
  if (file1lineStr != null)
  {
    while(file2LineNo<file2Lines.Length)
    {
      file2Str = file2Lines[file2LineNo];
      if (file1LineNo == file2LineNo)
      {
         if(!file2Str.Trim().Equals(file1Str.Trim()))
         {
           Result = false;
           info.Add(rowNumber1++, new Object[]{"", file1lineStr, file2Str});
         }
      break;
      }
     file2LineNo++;
    }
   }
  }
 file1LineNo++;
}
foreach(var infoValue in info)
{
   Object[] objectArr = info.Value;
   for (int I=0; I<objectArr.Length; i++)
  {
    result.WriteResultToFile3(....);
   }
   rowed++;
}
 return Result;
}
}
}
HCYH
  • 47
  • 1
  • 6
  • 1
    Can you please share some of the code you've done so far? – Jlalonde Mar 19 '19 at 22:57
  • You would need to stream the two files, then compare them line by line. With a flagged comparison, to output those differences. – Greg Mar 19 '19 at 22:58
  • Use a version control system like GitHub, Azure Repos, or Subversion. –  Mar 19 '19 at 23:03
  • 1
    Why are you doing this yourself instead of using [an existing tool](https://stackoverflow.com/questions/138331/)? – Dour High Arch Mar 19 '19 at 23:03
  • 2
    Get yourself plenty of time, energize your brain cells, and read a lot about LCS problem (Longest Common Subsequence problem). In your case, the elements of the sequence to analyze would be whole text lines (each text line representing a single element). Or -- much better -- just don't do what i just said and spend some time looking for and trying some libraries that provide the desired diff functionality... –  Mar 19 '19 at 23:04
  • I have added my current script...which isn't skip missing and extra lines for comparison, just write all the differences out – HCYH Mar 19 '19 at 23:27
  • Oh seems I can Install-Package Diff.Match.Patch ? – HCYH Mar 19 '19 at 23:31
  • One trick will be determining if a line is missing in one file, or if only a blank line was inserted, which means you have to read ahead as well. what if you have "hello" on file1.line1 and "" on file2.line1, then "world" on file1.line2 and "hello" on file2.line2, with "world" on file2.line3. To a human, the files are identical except file2 has an extra line at the top. But if you compare each line only to it's corresponding line number in the other file, they will be appear completely different (no two lines match at the same line number). – Rufus L Mar 20 '19 at 01:01

3 Answers3

2

This will output the difference in two files, outputting a diff file and a text file with the same content. You can further modify the output to your choosing using the git diff options. You will need the Git client installed on your machine or embed it in your source code using a NuGet package perhaps...

https://git-scm.com/downloads

using System;
using System.Collections.ObjectModel;
using System.Management.Automation;

namespace PowerShell_Export_Differences
{
class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Hello World!");
        string directory = "C:/PowershellTest";

        using (PowerShell powershell = PowerShell.Create())
        {
            powershell.AddScript(String.Format(@"cd {0}", directory));
            powershell.AddScript(@"git init");
            powershell.AddScript(@"git diff --no-index  Text1.txt Text2.txt > Text3.diff");
            powershell.AddScript(@"git diff --no-index  Text1.txt Text2.txt > Text3.txt");
            Collection <PSObject> results = powershell.Invoke();
            Console.Read();
        }
    }
}
}

Sample output:

enter image description here

  • Thank you Jesse, I'll have a try later :) Currently I just print out all the differences line by line.. – HCYH Mar 27 '19 at 23:23
0

Your question is a little inspecific and you really need to show some code.

Saying that if its basically "blank / empty" lines you have an issue with then try the following.

Remove all the "blank / empty" lines from both file 1 and file 2 then do the compare.

This to be honest its a bit "bodgy" but it may answer your question.

Show what you have already tried (code wise) and folk may be more willing to write something.

William Humphreys
  • 1,206
  • 1
  • 14
  • 38
-2

you have to read each line and store the lines in different string, then just compare. e.g

string s1 = "first"; // store the line you have read from file1

string s2 = "second"; // store the line you have read from file2

if( !s1.Equals(s2) )

{

 // store the Result in file3 if there are not Equals.

}

NOTE: this code is working when the order of the lines are the same in the both files.

B.Dorsey
  • 11
  • 1
  • 5
  • 1
    Yeah, that does not help at all in identifying missing or additional/extra lines. Your suggested answer can only tell whether some line number in one file is identical to the same line number in the other file. For example given one text file with lines A,B,C,D,E and another text file with lines A,C,D,E. Your suggested answer would indicate for this example that both text files only have line A in common, which is of course bollocks (because both files have A as well as C,D,E in common) –  Mar 19 '19 at 23:27
  • 1
    By the way, the issue is not as trivial as you seem to believe. Google for LCS problem (Longest Common Subsequence problem) to see what would be really necessary to solve OP's problem. Note that the example i gave in my comment was just a simple example to help you understand. The whole thing can easily become much more complicated... –  Mar 19 '19 at 23:29