1

I have read a lot of ways to find a substring in a string. But in my case, I need to find a string of words (ie substring) inside a string. We can achieve this in O(n^3) times which is a very bad time complexity

For example

sentences = ["jim likes mary", "kate likes tom", "tom does not like jim"]<br/>
phrases = ["jim tom", "likes"]

I want to find the complete phrase in the sentence irrespective of the position

In the above case, the output will be

[2, [0,1]]

Explanation: Wherever all words of phrase match in the sentence return the index of the sentence

1) The first phrase jim tom is only present in 2nd indexes of sentences which is tom does not like jim so return the 2nd index
2) Whereas the second phrase likes are present in 1st and 2nd array so return 0 and 1 indices

I did with brute force but that is not an efficient way to do it

final_arr = []
phrases.each do |phrase|
  temp_arr = []
  sentences.each_with_index do |sentence, index|    
    multiple_word_phrase  = phrase.split(" ")
    if multiple_word_phrase.length > 1
      flag = 1
      multiple_word_phrase.each do |word|
        if !sentence.include?(word)
          flag = 0
          break
        end
      end
      temp_arr << index if flag == 1
    else
      temp_arr << index if sentence.include?(phrase)
    end
  end
  final_arr << temp_arr if temp_arr.any?
end

Is there any efficient way to achieve this problem O(NlogN) Time. I think this can be achieved with dynamic programming but not sure how to do it

Aniket Tiwari
  • 3,561
  • 4
  • 21
  • 61
  • This code can be far simpler if you use more methods from [Enumerable](https://ruby-doc.org/core-2.7.0/Enumerable.html). I am happy to show you a cleaner way to write the code but I need to know what is the _exact_ expected output. – max pleaner Mar 07 '20 at 09:54
  • You need to specify the Ruby object you want to return. It may be implicit in your code but readers shouldn’t be required to figure it out. For example, you may want a hash like this: `{ "jim tom"=>["tom does not like jim"], "likes"=>["jim likes mary", "kate likes tom"] }`. – Cary Swoveland Mar 07 '20 at 10:01
  • @CarySwoveland Yes you are right somewhat but instead I need the index wherever the phrase is matching in the sentences array – Aniket Tiwari Mar 07 '20 at 10:03
  • Result should be `[[2], [0, 1]]`, btw – Ja͢ck Mar 07 '20 at 10:52
  • I don't know how you computed your time complexity but it seems incorrect, unless you do many assumptions: Size of `phrases` = Size of `sentences`, And each phrase has a constant number of words. If this is in the order of N then your time complexity will be O(N^4) (taking into account the first assumption) – Ayoub Omari Mar 07 '20 at 18:24

5 Answers5

1

Maybe something like this using each_with_index, and an array of arrays for the phrases (I think it's nicer):

sentences = ["jim likes mary", "kate likes tom", "tom does not like jim"]
phrases = [["jim", "tom"], ["likes"]]

final_arr = []
sentences.each_with_index do |sentence, index|
    phrases.each do |words|
        if words.all? { |word| sentence.include? word }
            final_arr << index
        end
    end
end

Demo

Though, it's basically the same complexity.

wp78de
  • 18,207
  • 7
  • 43
  • 71
1

There's not much you can optimise in terms of algorithm, but you can shorten the code a fair deal:

phrases.map do |phrase|
  words = phrase.split
  sentences.map.with_index do |sentence, index|
    index if words.all? { |word| sentence[word] }
  end.compact
end

Breakdown:

  • The end-result has the same dimension as phrases, so you can express that with a map operation.
  • Inside each result, the list of search results has at most the number of elements in sentences, so you'd either use filter() or map().compact
  • For the innermost loop, all?() is used to describe all words must exist inside each sentence.
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
1

Other option using Array#product:

# setup
mapped_phr = phrases.map(&:split).zip(0..)
mapped_sen = sentences.zip(0..)

# loop
res = mapped_phr
  .product(mapped_sen)
  .each_with_object(Hash.new { |h, k| h[k] = [] }) do |(phr, sen), h|
    h[phr.first] << sen.last if phr.first.all? { |e| sen.first.include? e }
  end

res #=> {["jim", "tom"]=>[2], ["likes"]=>[0, 1]}
res.values #=> [[2], [0, 1]]

Or you can join the phr.first to get a String as hash key.

iGian
  • 11,023
  • 3
  • 21
  • 36
1

I am not very familiar with Ruby, but if you have concepts like hashmaps and hashsets you can optimize it. As I mentioned in my comment, if you are convinced that the time complexity of your algorithm is O(N^3) then you can optimize it to O(N^2).

To do so, take the array sentences and transform it to a hashmap that associates to each word a Set of indexes where it appears. For your example it will look like: "jim" -> Set(0, 2), "tom" -> Set(1, 2), "kate" -> Set(1) etc... This will take a time complexity of O(N) (because of O(1) time complexity of both look up in hashmap and addition in Set)

Now for each phrase you split it and take the intersection of the Sets of its words. For example, the result of first phrase will be the intersection of Indexes_of("jim") and indexes_of("tom") which is Set(1). The intersection will take you O(N) for each phrase. Given that you have N phrases, the time complexity is O(N^2).

Ayoub Omari
  • 806
  • 1
  • 7
  • 24
  • Can we do this in O(NlogN) time – Aniket Tiwari Mar 08 '20 at 09:21
  • I don't think you can have a better time complexity than `O(N^2)`, because for each phrase you may have `O(N)` indexes and there is `O(N)` phrases. Constructing the result only will inevitably take `O(N^2)` – Ayoub Omari Mar 08 '20 at 12:35
1

You can speed the calculations as follows:

require 'set'

h = sentences.each_with_index.with_object({}) do |(str,i),h|
  h[i] = str.split.to_set
end
  #=> {0=>#<Set: {"jim", "likes", "mary"}>,
  #    1=>#<Set: {"kate", "likes", "tom"}>,
  #    2=>#<Set: {"tom", "does", "not", "like", "jim"}>} 

keys = h.keys
  #=> [0, 1, 2]

phrases.map do |p|
  pa = p.split
  keys.select { |j| pa.all? { |s| h[j].include?(s) } }
end
  #=> [[2], [0, 1]]

The return value is not quite the return value required by the question: [2, [0,1]]. I suggest making all elements of this array arrays, however, even when they contain only a single element (e.g., [2]). This will make life easier for the coder down the road. If one wants [2, [0,1]], however, perform a simple adjustment at the end:

phrases.map do |p|
  pa = p.split
  keys.select { |j| pa.all? { |s| h[j].include?(s) } }
end.map { |e| e.size == 1 ? e.first : e }
  #=> [2, [0, 1]]

As the computational complexity of set lookups is close to O(1) (constant), the computational complexity of this approach is close to O(n2), where n is some measure to the sizes of sentences and phrases.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100