0

I am trying to find a way to compare some text in 2 files and if a match is found then run a process.

Here are examples of the files;

'File A' = Automated list of text with this format;

example1
ex2
289 Example
fht_nkka

'File B' = File names from a directory search;

example1
test2
test4785

Using my 2 example files, I want to search them both and find matches.

So 'File A' above contains 'example1' and 'example1' is in 'File B'. What I want to be able to do is create 'string[] match based on all matches. Is there an easy way of doing this?

NOTE: these files do not always have the same line data or amount of lines in.

Pickled Egg
  • 123
  • 3
  • 12
  • UPDATE - I can also change the layout of 'File A' if needed but the line items will be the same either way – Pickled Egg Sep 09 '13 at 12:31
  • 1
    have you tried anything? any code sample? – fhnaseer Sep 09 '13 at 12:34
  • I am a beginner with C# so not tried anything yet as I am stuck as to how to do it. I have other searches being run elsewhere but they are looking for static search criteria, this dynamic 'search and run' process has stumped me :S – Pickled Egg Sep 09 '13 at 12:52
  • 1
    If this project grows beyond the capabilities of your homegrown solution, you may want to look into [Lucene.Net](http://lucenenet.apache.org/) – mbeckish Oct 03 '13 at 13:19

4 Answers4

1
  1. Use System.IO.File.ReadAllLines() on each of the two files to create two string arrays.
  2. Create a sorted version of array containing filenames to improve search performance. You can use LINQ for this purpose.
  3. Given that your first file has a fixed layout, your required filename should always be at Line No. 4 for each record, so you can use a for loop on your second array with fixed increment to read the required filename.
  4. Use Array.BinarySearch() to quickly locate whether that required filename exists in the list of files (the other array that is).

Here is a rough sketch of the code:

string[] AllRecs = System.IO.File.ReadAllLines(FIRST_FILE_PATH);
string[] AllFileNames = System.IO.File.ReadAllLines(SECOND_FILE_PATH);
Array.Sort(AllFileNames);

for (int i = 3; i < AllRecs.Length; i += 8) 
{
    if (Array.BinarySearch(AllFileNames, AllRecs(i) + ".exe") >= 0)
        System.Diagnostics.Process.Start(AllRecs(i) + ".exe");

}
dotNET
  • 33,414
  • 24
  • 162
  • 251
  • considering he is begginer with C# this approach is rather complex – Mauricio Gracia Gutierrez Sep 09 '13 at 12:58
  • Ya, I kind of agree, but then he has to start it from somewhere, so why not today. :) – dotNET Sep 09 '13 at 13:10
  • 2
    Sorted array is not the most efficient. HashSet is, plus is simpler. Then a simple contains would yield a result in approx constant time vs n squared time. – Aron Sep 09 '13 at 13:16
  • @dotNET why the i += 8 ? – Mauricio Gracia Gutierrez Sep 09 '13 at 13:44
  • I take it the i += 8 is for the end of the file. The file is dynamic and may not only be 8 lines long. It gets over written each time so the beginning would always be 3 lines. Again, these 3 lines can be removed if it makes the process easier – Pickled Egg Sep 09 '13 at 15:31
  • @Aron: +1 for HashSet. N squared time is a bit of exaggeration though. I'm using BinarySearch on sorted array, so the average time would be near Log(N). For the rest of us, the structure of your *File A* appears to have 8 lines of text per record, if I understand correctly. Starting with Line[3] (i.e. 4th line) would give you the file name from the first record. Adding 8 in the next iteration will move the index to the file name line of the 2nd record and so on. – dotNET Sep 09 '13 at 16:35
  • the file gets recreated at each run. it may not always be 8 lines long. i can remove the first 3 lines from the file as it is something the code creates earlier. that way it will just be a list of text – Pickled Egg Sep 10 '13 at 20:59
1

Managed to sort this out, here is what I have done;

var fileAcontents = File.ReadAllLines(fileA);
var fileBcontents = File.ReadAllLines(fileB);

HashSet<string> hashSet = new HashSet<string>(fileAcontents);
foreach (string i in fileBList)
{
    if (hashSet.Contains(i))
    {
        // <- DO SOMETHING :)
    }
}
Pickled Egg
  • 123
  • 3
  • 12
  • This is not a universal solution if you want to compare 2 files like a diff tool. It will not work if `fileA` has duplicate lines. `HashSet` cannot contain duplicates. – Ionut Enache Aug 19 '20 at 12:54
0
//Keep in a list of strings with FileA contents 

List<string> linesOfFileA = new List<string>();
string line ;

using (StreamReader sr = new StreamReader(pathToFileA)) 
{
    //read each line of fileA
    line = sr.ReadLine();
    while(line != null)
    {
        linesOfFileA.Add(line) ;
        line = sr.ReadLine();
    }
}
//Now read the contents of FileB

string fileWithoutExtension ;
int posOfExtension ;

using (StreamReader srB = new StreamReader(pathToFileB)) 
{
    //read each line of fileB
    line = sr.ReadLine();
    while(line != null)
    {
        posOfExtension = line.LastIndexOf(".");

        if(posOfExtension < 0)
        {
            fileWithoutExtension = line ;
        }               
        else
        {
            fileWithoutExtension = line.Substring(0,posOfExtension) ;
        }

        //Check to see if the FileA contains file but without Extension
        if(linesOfFileA.Contains(fileWithoutExtension))
        {
            //Store into another list / or execute here
        }
        line = sr.ReadLine();
    }
}

In the first part of the code you skip the number of lines that you need, but because of the current shown format they will not affect your comparison

Mauricio Gracia Gutierrez
  • 10,288
  • 6
  • 68
  • 99
  • ok, so I have removed the extensions from fileB so now the files are just lines of data. I have created the first list based on fileA. What would be the easiest way to check for 'matches' and write to string[] match = 'matches'? – Pickled Egg Oct 03 '13 at 12:04
-1

Fill a dictionary object with File A contents then loop through File B contents querying File A dictionary object. Reason for dictionary object is its speed if you have a large array of data.

Dictionary<int, string> FileA = new Dictionary<int, string>();
string sFileAList = dataFileA;

Loop through File A contents and add to Dict where i is a counter.

int count = 0;
foreach (string s in sFileAList.split('\n')) {
    count++;
    if (count > 3) FileA.Add(i, s);
}

Then compare while looping through File B contents.

foreach (string s in dataFileB.split('\n')) {
    if (FileA.ContainsValue(s)) {
        // Run exe
    }
}
Papa
  • 1,628
  • 1
  • 11
  • 16