According to your comments, the degree of similarity is calculated based on the common number of digits, regardless of their order.
Similarity between two files
The easiest way to do it, would be to load two consecutive files (say two open ifstream
sfs1
and sts2
) into two vectors:
std::vector<int> v1{1,2,3,4,5,6,7,8};
copy(istream_iterator<int>(sfs1), istream_iterator<int>(), back_inserter(v1));
copy(istream_iterator<int>(sfs2), istream_iterator<int>(), back_inserter(v2));
Sort the vectors:
sort(v1.begin(), v1.end());
sort(v2.begin(), v2.end());
Then take the intersection of the two sorted vectors using the standard algorithm:
set_intersection(v1.cbegin(), v1.cend(), v2.cbegin(), v2.cend(), back_inserter(sim));
You then just have to look for the sizes:
cout << "Similar elements: " << sim.size()<<endl;
cout << "Similarity coefficient: "<< (double)sim.size()/max(v1.size(), v2.size())*100 <<"%"<<endl;
Now you have to add some error handling in case both vectors would be empty (which would lead to a divide by 0 here).
Here an online demo using stringstreams instead of filestreams:
How to do for several files
According to your question, you don't need to look for similarities in each possible pair of files, but only between subsequent files.
So after you have compared the two first files, you just need to copy v2
into v1
, read the next file into v2
and sort it. And calculate new similarity.
You also need to have a counter to count consecutive near-matches. Increment it, every time the similarity is beyond a certain threshold (e.g. 90%). Reset it to 0 every time the similarity is below. As soon as your counter reaches the number of consecutive near-matches that you expect, just stop :-)