-2

I have 2 text files with strings(a few hundred each file). The idea is to compare the contents of each, if a match is found, then that string is outputted to one file. If there is no match found, then the string is outputted to a different file. Essentially, input file 1 contains a master list of names and I am comparing that to input file 2. So we take name 1 on the master list and then compare that name to every name on the the other input file.

The main part I am stuck on is making an algorithm that will correctly traverse the files. I am also not sure if I am comparing the strings correctly, but this could be a result of other errors. I am new to c++ so I am not totally aware of all the rules of the language.

So there are 480 names on the master list and 303 names on the 2nd list, so there should be 303 names matching and 177 not matching, I made counters to ensure the numbers matched up. First I tried a simple while loop that would loop as long as input from the master file was being taken in, but I ran into a problem where I wasn't matching all of the files(makes sense), so I thought that maybe I needed something a little more complex so I tried reading all of the values from each input file to their own arrays and tried to compare the elements of the arrays. I read and successfully printed the arrays, but I ran into some other issues. Namely segmentation fault, which was apparently caused by sizeof(), both I am still trying to troubleshoot. I tried doing it like this:

//Had problems with making empty arrays
string arrMidasMaster[480];
string arrMidasMath[303];

for (int i = 0; i < sizeof(arrMidasMaster); ++i)
    {
        for (int j = 0; j < sizeof(arrMidasMath); ++j)
        {
            if (arrMidasMaster[i] == arrMidasMath[j]) //match
            {
                outData_Elig << arrMidasMaster[i] << endl;
                num_eligible ++; //counter
            }
            else                                      //No match
            {
                continue;
                //Where can I put these statements?
                //outData_Ineli << arrMidasMaster[i] << endl;
                //num_ineligible ++; //counter
            }
        }
    }

In the grand scheme of things this looks like it should be able to do what I need it to do, but there are still things that need to be done with it. Other than the segmentation fault, the if else statement needs work. This is beacause I need to use continue to keep going until a match is found, but what if a match is never found, then it looks like it'll just go back to the outer loop and test the next name, but I want it to execute the 2 statements as shown above. I hope this is enough to go off of.

  • 2
    `sizeof(arrMidasMaster)` is the size of the array **in bytes**. If you want the size of the array that's just `std::size(arrMidasMaster)` – john Jun 20 '23 at 15:02
  • *Namely segmentation fault, which was apparently caused by sizeof()* -- If you printed the value of `sizeof(arrMidasMaster)`, you would quickly see it is not correct. – PaulMcKenzie Jun 20 '23 at 15:03
  • Plus as you say the logic is not correct, but I think first you need to learn how to loop though an array. – john Jun 20 '23 at 15:04
  • BTW there is no point at all in putting `continue` at the end of a loop, because that's what happens anyway at the end of a loop. – john Jun 20 '23 at 15:05
  • On the plus side you are comparing the strings correctly, use `==` to compare two C++ strings for equality. – john Jun 20 '23 at 15:07
  • 2
    *The main part I am stuck on is making an algorithm that will correctly traverse the files* -- Read all contents of file 1 to a `std::vector`. Read all contents of file 2 to a `std::vector`. Then `std::sort` both vectors. Then use `std::set_intersection` to get all of the common strings, and `std::set_difference` for a list of strings that are in one but not in the other. – PaulMcKenzie Jun 20 '23 at 15:07
  • 1
    Instead of attempting to deal with files with several hundred strings, I suggest that you first attempt to deal with files with about 5 strings. That way, you will be able to run your program line by line in a [debugger](https://stackoverflow.com/q/25385173/12149471) while monitoring the values of all variables. Once you have gotten your program to work with 5 strings, you can then increase the number of strings in your files. – Andreas Wenzel Jun 20 '23 at 15:08
  • 1
    `string arrMidasMaster[480];` -- What if there are not 480 names? The way you do this is to use a type that expands on each name in the file, and that is `std::vector`. – PaulMcKenzie Jun 20 '23 at 15:10
  • _Had problems with making empty arrays_ Indeed you would do - there's no such thing. @PaulMcKenzie tells you what you need to do in the comment above this one. – Paul Sanders Jun 20 '23 at 15:15
  • Problem description it to poor. Please provide example of input files and expected output files. Also precisely define what `compare the contents of each` means. Those to `for` loops looks suspicious. I do not think they do what was intended (even when size issues are resolved). – Marek R Jun 20 '23 at 15:16
  • [Small sample](https://godbolt.org/z/q7xf7Kh9E). You should write small programs to test various things first. This simply takes two vectors of strings, and gets the similar and different strings. Once you see that working, then you expand this to reading in the data into those vectors instead of hardcoding the data. – PaulMcKenzie Jun 20 '23 at 15:40

3 Answers3

1

Your example code has a lot of issues that it is really a long task to go through each of them. You need to read lines from your files and dynamically fill the text in vectors. Then you don't need to specify size of the arrays. Also, you should check how to work with loops in C++. But following is an example code for what you are trying. Go through the code and read comments to understand, and try it out according to your use case. First try out with a small example and then with your actual text files.

#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>

void compareFiles(const std::string& fileA, const std::string& fileB, const std::string& matchFile, const std::string& nonMatchFile) 
{
    std::ifstream inputFile1(fileA);
    std::ifstream inputFile2(fileB);
    std::ofstream outputFileMatch(matchFile);
    std::ofstream outputFileNonMatch(nonMatchFile);

    std::vector<std::string> masterList;  // to save the master list
    std::string line;

    // Read contents of fileA into a vector assuming each line has the text you want to match
    while (std::getline(inputFile1, line)) // reading until end of file
    {
        masterList.push_back(line); // push into the vector
    }

    // Compare contents of fileB with text from fileA in vector masterList
    while (std::getline(inputFile2, line)) // reading until end of file
    {
        if (std::find(masterList.begin(), masterList.end(), line) != masterList.end()) // check if the text in fileB is present in the fileA vector
        {
            outputFileMatch << line << "\n"; // write if match found
        } 
        else 
        {
            outputFileNonMatch << line << "\n"; // write if match not found
        }
    }

}

EDIT: As rightly pointed out by @john in the comment, it is not required to close the files as they would go out of scope automatically. So removed that code.

Anakin
  • 1,889
  • 1
  • 13
  • 27
  • 1
    I think this answer is pitched at just the right level. One very minor nitpick, you don't need to close the files at the end of the function, since that will happen automatically when the variables go out of scope. Sorry to mention it, but it always bugs me. – john Jun 20 '23 at 15:27
  • Thanks a lot! It read and printed all of the matches, but it did not read and print any of the non-matches. I have a counter in that else statement, but it didn't even get counted so it seems the else statement isn't even being executed. I can probably figure it out but thank you nonetheless. – Matthew Maisonave Jun 20 '23 at 16:30
  • @MatthewMaisonave I haven't actually run the code on files, so you have to debug it. Maybe use some prints inside the conditions to check. – Anakin Jun 21 '23 at 07:09
  • @Anakin I successfully debugged the program and posted my answer. Thank you for the help! – Matthew Maisonave Jun 23 '23 at 19:18
0

Not sure If I understood your problem correctly. Here is code which reads files line by line.

If line in first file "in.txt" is present in a dictionary file "dic.txt", then line is written to "foundItmes.txt" otherwise it is written to "remainintItmes.txt".

#include <algorithm>
#include <filesystem>
#include <fstream>
#include <iostream>
#include <iterator>
#include <string>
#include <unordered_set>

class LineHelper {
    std::string data;

public:
    friend std::istream& operator>>(std::istream& is, LineHelper& l);
    operator std::string() const { return data; }
};

std::istream& operator>>(std::istream& is, LineHelper& l)
{
    return std::getline(is, l.data);
}

using LineInputStreamIter = std::istream_iterator<LineHelper>;

std::unordered_set<std::string> loadDictionary(std::istream& in)
{
    return { LineInputStreamIter { in }, {} };
}

std::unordered_set<std::string> loadDictionary(const std::filesystem::path& p)
{
    std::ifstream f { p };
    return loadDictionary(f);
}

void partitionCopy(std::istream& in, std::unordered_set<std::string> dic, std::ostream& trueOut, std::ostream& falseOut)
{
    std::partition_copy(
        LineInputStreamIter { in }, {},
        std::ostream_iterator<std::string> { trueOut, "\n" },
        std::ostream_iterator<std::string> { falseOut, "\n" },
        [&dic](const auto& s) { return dic.count(s) != 0; });
}

void partitionCopy(const std::filesystem::path& in, const std::filesystem::path& dic, const std::filesystem::path& trueOut, const std::filesystem::path& falseOut)
{
    std::ifstream inF { in };
    std::ofstream trueOutF { trueOut };
    std::ofstream falseOutF { falseOut };
    partitionCopy(inF, loadDictionary(dic), trueOutF, falseOutF);
}

int main()
{
    partitionCopy("in.txt", "dic.txt", "foundItmes.txt", "remainintItmes.txt");
    return 0;
}

https://godbolt.org/z/73e4h46d3

Didn't test that properly, but should work properly.

Marek R
  • 32,568
  • 6
  • 55
  • 140
0

I was able to find the correct answer to the problem with everyones help, thank you. The find() function is looking for whatever matches to string line in the range of vComp's 1st element to its last. This is important because if you compare the files in the reversed order, you will only find matches. This is because string line will only contain matching strings and will not contain the rest of the non-matching data. This is always true when comparing a select group to the whole.

//Variable Declaration
ifstream inData_Master, inData_Comp;
ofstream outData_Match, outData_NonMatch;
int num_Matches = 0, num_NonMatches = 0; // counter
vector<string> vComp; // saves the compare list
string line;

inData_Master.open("Input_File_Master.txt");
inData_Comp.open("Input_File_Comp.txt");
outData_Match.open("Output_File_Match.txt");
outData_NonMatch.open("Output_File_NonMatch.txt");

// Reads & saves contents of our Comp input file, which will be compared against the Master file, store in line
while (getline(inData_Comp,line)) 
{
    vComp.push_back(line); // Adds new element to the end of the Vector via line
}
// Reads the contents of the master file 
while (getline(inData_Master, line))
{
    if (find(vComp.begin(),vComp.end(),line) != vComp.end()) // if nonMatch, find = vComp.end()  
    {
        outData_Match << line << endl; // match found
        num_Matches++; // counts match
    }
    else
    {
        outData_NonMatch << line << endl; // match not found
        num_NonMatches++; // counts non match
    }
}
cout << "Matches: " << num_matches << endl;
cout << "NonMatches: " << num_NonMatches << endl;