1

I was trying to figure out a way to compare two strings and returns their 'common' words, given that the strings are always in lowercase, I wanted to create a function for this..for example

str1 = "this is a test"
str2 = "saldkasl test asdasd"

result = stringcompare(str1, str2) 'returns "test"

the common word between the two strings should be "test" and if, the two strings have two or more common words, the function should concatenate the strings

str1 = "this is another test"
str2 = "another asdsada test asdsa"
result = stringcompare(str1, str2) ' returns "another test"

i have found a useful link, it gave me an idea but somehow something is really lacking

A pseudocode of what I'm doing right now is this,

**

'1st: separate the words by every space, " ", then store it in an array or list 
'2nd: compare each item on the list, if equal then store to variable 'result'

**

is this okay? I think it is slow and maybe there is someone out there that has a better approach on this..thanks

Community
  • 1
  • 1
Codemunkeee
  • 1,585
  • 5
  • 17
  • 29
  • methinks this is a sound solution already. how do you store it in a container? Perhaps that's the problem. – Malcolm Salvador Jan 15 '14 at 01:46
  • Split each string as array, concate them, and by linq you aggregate the content, and filter only text that have count as many as your string, for example if you have 3 strings, you filter the count of text equals to 3 – user11982798 Sep 14 '19 at 02:26

2 Answers2

3

As measured in VS 2013, below solution is on average 20% faster than Guffa's:

Dim str1 As String = "this is another test"
Dim str2 As String = "another asdsada test asdsa"
Dim result As String = String.Join(" ", str1.Split(" "c).
                              Intersect(str2.Split(" "c)))

Results were obtained by looping each solution 100000 times and measuring time with StopWatch.

Victor Zakharov
  • 25,801
  • 18
  • 85
  • 151
  • When I tried this, the error reads: "Intersect is not a member of System.Array". I am using vb.net 2005 – Malcolm Salvador Jan 15 '14 at 01:54
  • @Malky.Kid: The above is Visual Studio 2010+. Is there any reason you need to stick with 2005? – Victor Zakharov Jan 15 '14 at 01:56
  • I see, thanks. The office does not want to invest with newer versions of visual studio :( – Malcolm Salvador Jan 15 '14 at 01:59
  • @Malky.Kid: You have two options - implement Intersect yourself (which should not be too hard anyway), or use [Linq Bridge](https://code.google.com/p/linqbridge/downloads/list), linked [from this question](http://stackoverflow.com/questions/982174/how-to-intersect-two-arrays). – Victor Zakharov Jan 15 '14 at 02:01
  • I get similar results for performance with very short strings, but the opposite for longer strings. – Guffa Jan 15 '14 at 11:49
  • @Guffa: Please feel free to share performance measurements in your answer. I'd be interested to know when it makes sense to use a hashset and when my approach is faster. Thanks. – Victor Zakharov Jan 15 '14 at 12:12
  • 1
    @Neolisk: I get a break even when the strings contain 8 words each. – Guffa Jan 15 '14 at 15:25
1

Use a hash set for the words in the first string, then you can just loop through the words in the second string and check if they exist in the first, and get close to O(n) performance:

Dim first As New HashSet(Of String)(str1.Split(" "c))
Dim result As String() = str2.Split(" "c).Where(Function(s) first.Contains(s)).ToArray()
Guffa
  • 687,336
  • 108
  • 737
  • 1,005