Suppose I have two files main.txt
and sub.txt
. Suppose both files have unique lines i.e. the same line of text does not occur twice in either file. Also suppose there are no empty lines in either file. Now, consider the files as sets of strings, with each member of the set occuring on a line. This is possible because of our uniqueness condition. Now suppose sub.txt
is a subset of main.txt
in this way. How do we compute the set difference of main.txt
and sub.txt
to produce a new file diff.txt
? To be clear, the lines of diff.txt should be those that occur in main.txt but not sub.txt. There should be no empty lines in diff.txt. Order in diff.txt is irrelevant.
Example
main.txt:
Hello
World
How
You
Are
sub.txt:
World
Hello
diff.txt:
How
Are
You
Bonus Questions
- How can I tell that one set is actually a subset of the other? This is an assumption in the question, but in practice we mightn't know this for sure and would want a way to check it automatically.
- How can I tell if the lines in each file are truly unique?
- How can I tell if there are no blank lines?