-1

I want to create a string compare script in c++.
Total Commander file compare function is pretty good:

total commander sample


How does this algorithm work?
Can somebody share a snippet for this function?

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
user594297
  • 93
  • 2
  • 13
  • Did you hear about [diff](http://en.wikipedia.org/wiki/Diff) or [LCS](http://en.wikipedia.org/wiki/Longest_common_subsequence_problem)? – Alexey Frunze Apr 16 '13 at 09:30
  • 1
    maybe try using something like hamming distance. also read this post http://stackoverflow.com/questions/6163611/compare-two-files – Gilad Apr 16 '13 at 09:31
  • @Androidy That is completely unrelated. – Alexey Frunze Apr 16 '13 at 09:33
  • why not @AlexeyFrunze?please explain hamming is for comparing words. you can apply it to buffers as well. not ideal like LCS. – Gilad Apr 16 '13 at 09:34
  • 1
    @Androidy That question is about checking whether or not two files are identical. Here the problem is different, the OP wants to see what has changed and where and not just detect that there's been some change. – Alexey Frunze Apr 16 '13 at 09:35
  • @AlexeyFrunze ok I get it. thanks – Gilad Apr 16 '13 at 09:36

3 Answers3

2

I cannot tell You, what total commander does. Perhaps one could disassemble it and try to trace the techniques.

But a common algorithm is this one :

http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

A string search algorithm. It surely is useful for comparison as well.

Also please refer to this post : c++ string compare algorithm

Best regards

Community
  • 1
  • 1
icbytes
  • 1,831
  • 1
  • 17
  • 27
  • How do you propose to apply KMP to the OP's problem? – Alexey Frunze Apr 16 '13 at 09:34
  • KMP : String search alrgorithm = string (sub)string matching algorithm. Negating it will make a string "unmatching" algorithm. Right? – icbytes Apr 16 '13 at 09:36
  • I'm not following you. Can you give an example of how one would use KMP to find that between the two given strings the word `ate` changed to `eat` and then the word `on` was deleted? – Alexey Frunze Apr 16 '13 at 09:40
  • Hello. No. I must admit. Now not. But if You will let me take some time, i will try to develop a kind of solution. – icbytes Apr 17 '13 at 06:52
1

You may use the diff or LCS algorithm to do this kind of comparison.

Below is a straightforward C implementation of it:

#include <string.h>
#include <stdlib.h>
#include <stdio.h>

int lcs(const char* s1, const char* s2)
{
  size_t l1 = strlen(s1), l2 = strlen(s2);
  size_t sz = (l1 + 1) * (l2 + 1) * sizeof(size_t);
  size_t w = l2 + 1;
  size_t* dpt;
  size_t i1, i2;

  if (sz / (l1 + 1) / (l2 + 1) != sizeof(size_t) ||
      (dpt = malloc(sz)) == NULL)
  {
    printf("Not enough memory\n");
    return EXIT_FAILURE;
  }

  for (i1 = 0; i1 <= l1; i1++)
    dpt[w * i1 + 0] = 0;
  for (i2 = 0; i2 <= l2; i2++)
    dpt[w * 0 + i2] = 0;

  for (i1 = 1; i1 <= l1; i1++)
    for (i2 = 1; i2 <= l2; i2++)
    {
      if (s1[l1 - i1] == s2[l2 - i2])
      {
        dpt[w * i1 + i2] = dpt[w * (i1 - 1) + (i2 - 1)] + 1;
      }
      else if (dpt[w * (i1 - 1) + i2] > dpt[w * i1 + (i2 - 1)])
      {
        dpt[w * i1 + i2] = dpt[w * (i1 - 1) + i2];
      }
      else
      {
        dpt[w * i1 + i2] = dpt[w * i1 + (i2 - 1)];
      }
    }

  i1 = l1; i2 = l2;
  for (;;)
  {
    if ((i1 > 0) && (i2 > 0) && (s1[l1 - i1] == s2[l2 - i2]))
    {
      printf("%c", s1[l1 - i1]);
      i1--; i2--; continue;
    }
    else
    {
      if (i1 > 0 &&
          (i2 == 0 || dpt[w * (i1 - 1) + i2] >= dpt[w * i1 + (i2 - 1)]))
      {
        printf("-%c", s1[l1 - i1]);
        i1--; continue;
      }
      else if (i2 > 0 &&
               (i1 == 0 || dpt[w * (i1 - 1) + i2] < dpt[w * i1 + (i2 - 1)]))
      {
        printf("+%c", s2[l2 - i2]);
        i2--; continue;
      }
    }

    break;
  }
  printf("\n");

  free(dpt);
  return EXIT_SUCCESS;
}

int main(int argc, char** argv)
{
  const char *s1, *s2;
  if (argc == 3)
  {
    s1 = argv[1]; s2 = argv[2];
  }
  else
  {
    printf("Usage:\n  lcs-diff.exe <string1> <string2>\n\n");
    s1 = "I ate apple on yesterday"; s2 = "I eat apple yesterday";
    printf("Sample comparison:\n\n  \"%s\" vs \"%s\":\n\n", s1, s2);
  }

  return lcs(s1, s2);
}

Output (ideone):

Usage:
  lcs-diff.exe <string1> <string2>

Sample comparison:

  "I ate apple on yesterday" vs "I eat apple yesterday":

I +eat-e apple -o-n- yesterday
Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
0

From scratch, I would approach this problem like that (pseudocode):

String[] sarr1 = string1.split();

for (int i1 =0; i1<sarr1.length; i1++) {
  if (!string2.contains(sarr[i1]) {
    markWordRed(string1, sarr[i1]);
  }
}

String[] sarr2 = string2.split();
for (int i2 =0; i2<sarr2.length; i2++) {
  if (!string1.contains(sarr[i2]) {
    markWordRed(string2, sarr[i2]);
  }
}

Starting from that one may additionally:

  • check the order of the words, not only whether they exist or not

  • check similarity of each not-found word vs all not found words in the second string and show letter differences

Dariusz
  • 21,561
  • 9
  • 74
  • 114