I want to create a string compare script in c++.
Total Commander file compare function is pretty good:
How does this algorithm work?
Can somebody share a snippet for this function?
I want to create a string compare script in c++.
Total Commander file compare function is pretty good:
How does this algorithm work?
Can somebody share a snippet for this function?
I cannot tell You, what total commander does. Perhaps one could disassemble it and try to trace the techniques.
But a common algorithm is this one :
http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
A string search algorithm. It surely is useful for comparison as well.
Also please refer to this post : c++ string compare algorithm
Best regards
You may use the diff or LCS algorithm to do this kind of comparison.
Below is a straightforward C implementation of it:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
int lcs(const char* s1, const char* s2)
{
size_t l1 = strlen(s1), l2 = strlen(s2);
size_t sz = (l1 + 1) * (l2 + 1) * sizeof(size_t);
size_t w = l2 + 1;
size_t* dpt;
size_t i1, i2;
if (sz / (l1 + 1) / (l2 + 1) != sizeof(size_t) ||
(dpt = malloc(sz)) == NULL)
{
printf("Not enough memory\n");
return EXIT_FAILURE;
}
for (i1 = 0; i1 <= l1; i1++)
dpt[w * i1 + 0] = 0;
for (i2 = 0; i2 <= l2; i2++)
dpt[w * 0 + i2] = 0;
for (i1 = 1; i1 <= l1; i1++)
for (i2 = 1; i2 <= l2; i2++)
{
if (s1[l1 - i1] == s2[l2 - i2])
{
dpt[w * i1 + i2] = dpt[w * (i1 - 1) + (i2 - 1)] + 1;
}
else if (dpt[w * (i1 - 1) + i2] > dpt[w * i1 + (i2 - 1)])
{
dpt[w * i1 + i2] = dpt[w * (i1 - 1) + i2];
}
else
{
dpt[w * i1 + i2] = dpt[w * i1 + (i2 - 1)];
}
}
i1 = l1; i2 = l2;
for (;;)
{
if ((i1 > 0) && (i2 > 0) && (s1[l1 - i1] == s2[l2 - i2]))
{
printf("%c", s1[l1 - i1]);
i1--; i2--; continue;
}
else
{
if (i1 > 0 &&
(i2 == 0 || dpt[w * (i1 - 1) + i2] >= dpt[w * i1 + (i2 - 1)]))
{
printf("-%c", s1[l1 - i1]);
i1--; continue;
}
else if (i2 > 0 &&
(i1 == 0 || dpt[w * (i1 - 1) + i2] < dpt[w * i1 + (i2 - 1)]))
{
printf("+%c", s2[l2 - i2]);
i2--; continue;
}
}
break;
}
printf("\n");
free(dpt);
return EXIT_SUCCESS;
}
int main(int argc, char** argv)
{
const char *s1, *s2;
if (argc == 3)
{
s1 = argv[1]; s2 = argv[2];
}
else
{
printf("Usage:\n lcs-diff.exe <string1> <string2>\n\n");
s1 = "I ate apple on yesterday"; s2 = "I eat apple yesterday";
printf("Sample comparison:\n\n \"%s\" vs \"%s\":\n\n", s1, s2);
}
return lcs(s1, s2);
}
Output (ideone):
Usage:
lcs-diff.exe <string1> <string2>
Sample comparison:
"I ate apple on yesterday" vs "I eat apple yesterday":
I +eat-e apple -o-n- yesterday
From scratch, I would approach this problem like that (pseudocode):
String[] sarr1 = string1.split();
for (int i1 =0; i1<sarr1.length; i1++) {
if (!string2.contains(sarr[i1]) {
markWordRed(string1, sarr[i1]);
}
}
String[] sarr2 = string2.split();
for (int i2 =0; i2<sarr2.length; i2++) {
if (!string1.contains(sarr[i2]) {
markWordRed(string2, sarr[i2]);
}
}
Starting from that one may additionally:
check the order of the words, not only whether they exist or not
check similarity of each not-found word vs all not found words in the second string and show letter differences