1

Before marking this as duplicate, please read the details here.

Example 1:

String A: The seven habits of highly effective people.

String B: "This is a sample text. There is only one product in it. It is a book. The book is The seven habits of highly effective people."

Example 2:

String A: The seven habits of highly effective people.

String B: "This is a sample text. There is only one product in it. It is a book. The book is The seven habits of highly effective peopl."

Now solving the above examples with a code like
B.Contains(A)
will give the correct results. However the same code will return "false" as output in Example 2.

How do I resolve this problem?

There is an "e" missing in example 2 and I am aware about it and that's the problem. How do I compare one string with another where string A is nearly identical with a "part of string B"?

milan m
  • 2,164
  • 3
  • 26
  • 40
  • 4
    example 2 doesnt contain A – Sayse Sep 11 '13 at 08:24
  • 1
    @user1039119 - Same code returns "false" as output in Example 2,as the complete string is not there. what you want to achieve ? – Bibhu Sep 11 '13 at 08:24
  • 1
    At the end of string B in ex.2, you have *peopl.* not *people.* – jwaliszko Sep 11 '13 at 08:25
  • 1
    The strings in example 2 are obviously different - if you want to get matches for "nearly indentical" strings it gets difficult very fast, simply because defining "nearly identical" is fun. – Christian Sauer Sep 11 '13 at 08:25
  • Do you see difference between both examples? – Hamlet Hakobyan Sep 11 '13 at 08:26
  • Isn't people missing an 'e' in example 2? – Damon Sep 11 '13 at 08:26
  • @Christian, Exactly, that's what I want. How to define the nearly identical string.? – milan m Sep 11 '13 at 08:26
  • There is an "e" missing in example 2 and I am aware about it. – milan m Sep 11 '13 at 08:27
  • You might consider something like the Levenshtein Distance algorithm.. but I am unsure how well that will perform for such large input strings (it works fine for smaller ones). Wiki here: http://en.m.wikipedia.org/wiki/Levenshtein_distance – Simon Whitehead Sep 11 '13 at 08:28
  • 1
    Fyi I have used Levenshtein for caching "sounds like" results on small words and it was fine. I suggest benchmarking though. – Simon Whitehead Sep 11 '13 at 08:29
  • 2
    What you're looking for is measuring how similar two strings are, then setting some threshold for how similar "similar enough" is. [This question looks like a good place to start](http://stackoverflow.com/questions/9453731/how-to-calculate-distance-similarity-measure-of-given-2-strings) – BambooleanLogic Sep 11 '13 at 08:29

4 Answers4

2

As stated in my comment.. the Levenshtein Distance algorithm (and similar ones) compute differences between strings and return a numerical result (wiki: http://en.m.wikipedia.org/wiki/Levenshtein_distance).

However, I would definitely apply benchmarking and caching strategies for these algorithms. They are decent with small input.. but when I have implemented it I have had to make sure I cache results / lookups. Your large example will not perform "fast".. depending on what "fast" is for your use case.

Simon Whitehead
  • 63,300
  • 9
  • 114
  • 138
  • Hi Simon, thanks for your answer. I have gone through the details about the algorithm you mentioned. However, the problem I am facing is not about comparing two nearly identical strings. The Problem is comparing one string with "part of another string" which is nearly identical. As shown in the example I mentioned, only a part of String B matches string A. hence the distance algorithm may not give accurate results. – milan m Sep 11 '13 at 09:04
  • @user1039119 you can calculate it, for example, `bool contains = Math.Abs(Math.Abs(strB.Length - strA.Length) - levenshteinDistance) < 3;` – I4V Sep 11 '13 at 09:32
1

You can use string.compare, Find below few examples which may help you.

string a = "a"; 
string b = "b"; 
int c;

c = string.Compare(a, b);
Console.WriteLine(c);

c = string.CompareOrdinal(b, a);
Console.WriteLine(c);

c = a.CompareTo(b);
Console.WriteLine(c);

c = b.CompareTo(a);
Console.WriteLine(c);
rekaszeru
  • 19,130
  • 7
  • 59
  • 73
ZubinAmit
  • 76
  • 5
0

What you are looking for looks like a search engine with score rate.

I used the Levenshtein Distance methode to search/compare string that looks like the same but who are not.

there is an example at the following link :

http://www.dotnetperls.com/levenshtein

Chopchop
  • 2,899
  • 18
  • 36
0

I am answering my own question.

I was looking for a solution to compare one string with another where string A is nearly identical with a "part of string B".

This is how I resolved the issue.

  1. I applied the "Longest Common Substring" algorithm and founded the longest common substring between the two strings.

  2. Then I used "Levenshtein Distance algorithm" to compare my String A with the "Longest Common Substring" found from step 1.

  3. If the result available from the algorithm mentioned in step 2 is above certain threshold, then it implies that the string A exists in String B.

  4. Problem Solved.

I have worked on the problem for one day and I have found decent results for the problem.

milan m
  • 2,164
  • 3
  • 26
  • 40