1

I'm writing a function to implement the solution to finding the number of times a word occurs in a list of elements, retrieved from a text file which is pretty straightforward to achieve.

However, I have been at it for two days trying to figure out how to check occurrences of a string which contains multiple words, can be two or more

So for example say the string is:

"hello bye"

and the list is:

["car", "hello","bye" ,"hello"]

The function should return the value 1 because the elements "hello" and "bye" only occur once consecutively.


The closest I've gotten to the solution is using

words[0:2] = [' '.join(words[0:2])]

which would join two elements together given the index. This however is wrong as the input given will be the element itself rather than an index.

Can someone point me to the right direction?

Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
  • This sounds like an [XY Problem](http://meta.stackexchange.com/a/66378/344593) to me: Can you explain what problem you're trying to solve with this code? There may be an easier solution to the core issue. – TemporalWolf Feb 08 '17 at 21:51
  • You can use a for loop with enumerate to keep the content *and* the index: http://stackoverflow.com/questions/22171558/what-does-enumerate-mean –  Feb 08 '17 at 23:07

4 Answers4

1

Match the string with the join of the consecutive elements in the main list. Below is the sample code:

my_list = ["car", "hello","bye" ,"hello"]
sentence = "hello bye"
word_count = len(sentence.split())
c = 0

for i in range(len(my_list) - word_count + 1):
    if sentence == ' '.join(my_list[i:i+word_count]):
        c+=1

Final value hold by c will be:

>>> c
1

If you are looking for a one-liner, you may use zip and sum as:

>>> my_list = ["car", "hello","bye" ,"hello"]
>>> sentence = "hello bye"
>>> words = sentence.split()

>>> sum(1 for i in zip(*[my_list[j:] for j in range(len(words))]) if list(i) == words)
1
Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
1

Let's split this problem in two parts. First, we establish a function that will return ngrams of a given list, that is sublists of n consecutive elements:

def ngrams(l, n):
    return list(zip(*[l[i:] for i in range(n)]))

We can now get 2, 3 or 4-grams easily:

>>> ngrams(["car", "hello","bye" ,"hello"], 2)
[('car', 'hello'), ('hello', 'bye'), ('bye', 'hello')]
>>> ngrams(["car", "hello","bye" ,"hello"], 3)
[('car', 'hello', 'bye'), ('hello', 'bye', 'hello')]
>>> ngrams(["car", "hello","bye" ,"hello"], 4)
[('car', 'hello', 'bye', 'hello')]

Each item is made into a tuple.

Now make the phrase 'hello bye' into a tuple:

>>> as_tuple = tuple('hello bye'.split())
>>> as_tuple
('hello', 'bye')
>>> len(as_tuple)
2

Since this has 2 words, we need to generate bigrams from the sentence, and count the number of matching bigrams. We can generalize all this to

def ngrams(l, n):
    return list(zip(*[l[i:] for i in range(n)]))

def count_occurrences(sentence, phrase):
    phrase_as_tuple = tuple(phrase.split())
    sentence_ngrams = ngrams(sentence, len(phrase_as_tuple))
    return sentence_ngrams.count(phrase_as_tuple)

print(count_occurrences(["car", "hello","bye" ,"hello"], 'hello bye'))
# prints 1
1

Two possibilities.

## laboriously

lookFor = 'hello bye'
words = ["car", "hello","bye" ,"hello", 'tax', 'hello', 'horn', 'hello', 'bye']

strungOutWords = ' '.join(words)

count = 0
p = 0
while True:
    q = strungOutWords [p:].find(lookFor)
    if q == -1:
        break
    else:
        p = p + q + 1
        count += 1

print (count)

## using a regex

import re
print (len(re.compile(lookFor).findall(strungOutWords)))
Bill Bell
  • 21,021
  • 5
  • 43
  • 58
0

I would suggest reducing the problem into counting occurrences of a string within another string.

words = ["hello", "bye", "hello", "car", "hello ", "bye me", "hello", "carpet", "shoplifter"]
sentence = "hello bye"
my_text = " %s " % " ".join([item for sublist in [x.split() for x in words] for item in sublist])


def count(sentence):
    my_sentence = " %s " % " ".join(sentence.split())
    return my_text.count(my_sentence)


print count("hello bye")
>>> 2
print count("pet shop")
>>> 0
kardaj
  • 1,897
  • 19
  • 19
  • Nice catch, I edited my answer based on your insight, and added some preprocessing to the text. I hope it helps! – kardaj Feb 08 '17 at 22:41