Find from a list of strings, from a list of strings

Question

I need help looping through a list of sentences/strings, and erase the string characters forwards, based on another list with words.

sentences = ['im not george smith my name is lucas mangulu thank you',
             'how shall i call you george smith oh okay got it'
             'we have detected a miyagi chung in the traffic flow']

words = ['lucas mangulu', 'george smith', 'miyagi chung']

I know I have to loop for each element in the sentences list. But then I'm stuck on how to find() for example in the same element in the words list into the sentences list. So that the final results should be:

sentences = ['im not george smith my name is',
             'how shall i call you'
             'we have detected a']

#OR

sentences = ['im not george smith my name is lucas mangulu',
             'how shall i call you george smith'
             'we have detected a miyagi chung']

Possible duplicate of [How to replace multiple substrings of a string?](https://stackoverflow.com/questions/6116978/how-to-replace-multiple-substrings-of-a-string) — Shuvojit, Apr 11 '19 at 14:35
To answer the question on how to find the sentence, in python `'my name is lucas mangulu thank you'.find('lucas mangulu')` will return 11, which is the position of `'lucas mangulu`' in the string. From there you can use substring operation to extract what you need. — Alain, Apr 11 '19 at 14:40
Your example output is messed up and confusing. In your first output: `['im not george smith my name is',` you left `george smith` but in the others you remove all names. Why? — Error - Syntactical Remorse, Apr 12 '19 at 13:23

Ralf · Answer 1 · 2019-04-12T10:20:27.597

I have dificulties understanding what you are looking for exactly, but here is a simple idea to remove the string in words from the strings in sentences; this is using a many calls to str.replace().

>>> words = ['lucas mangulu', 'george smith', 'miyagi chung']
>>> original_sentences = [
...     'im not george smith my name is lucas mangulu thank you',
...     'how shall i call you george smith oh okay got it',
...     'we have detected a miyagi chung in the traffic flow',
... ]
>>> original_sentences
['im not george smith my name is lucas mangulu thank you',
 'how shall i call you george smith oh okay got it',
 'we have detected a miyagi chung in the traffic flow']

>>> sentences = list(original_sentences)                  # make a copy
>>> for i in range(len(sentences)):
...     for w in words:                                   # remove words
...         sentences[i] = sentences[i].replace(w, '')
...     while '  ' in sentences[i]:                       # remove double whitespaces
...         sentences[i] = sentences[i].replace('  ', ' ')
>>> sentences
['im not my name is thank you',
 'how shall i call you oh okay got it',
 'we have detected a in the traffic flow']

Is this what you intended to do?

If you only want to replace one word in all the sentences, you could remove the nested for loop:

>>> sentences = list(original_sentences)                  # make a copy
>>> word_to_remove = words[0]                             # pick one
>>> for i in range(len(sentences)):
...     sentences[i] = sentences[i].replace(word_to_remove, '')
>>> sentences
['im not george smith my name is  thank you',
 'how shall i call you george smith oh okay got it',
 'we have detected a miyagi chung in the traffic flow']

Hi Ralf, that looks nice, the only thing is that I got to replace for each element. element [0] of words and sentences. so my result for element [0] of sentences should be: 'im not george smith my name is' — Lucas Mengual, Apr 12 '19 at 06:26
@LucasMengual I don't think I understand completely, but I edited my answer to add another idea. — Ralf, Apr 12 '19 at 10:21

score 0 · Answer 2 · answered Apr 12 '19 at 13:40

You give two example outputs for one input, which is extremely confusing. The following code may help you but I can't logically figure out how to match your example exactly.

That being said I have a hunch this is what you are looking for.

import re
sentences = ['im not george smith my name is lucas mangulu thank you',
             'how shall i call you george smith oh okay got it',
             'we have detected a miyagi chung in the traffic flow',
             'Is this valid?']

words = ['lucas mangulu', 'george smith', 'miyagi chung', 'test']
ocurrences = []
for sentence in sentences:
    # If you want to find all occurences in a sentence this line will help you
    # ocurrences.append([(x.start(), x.end(), x.group()) for x in re.finditer('|'.join(words), sentence)])

    # Look for a word in this sentence (the first occurrence of that word)
    search_result = re.search('|'.join(words), sentence)
    # If we found a word in this sentence
    if search_result:
        ocurrences.append((search_result.start(), search_result.end(), search_result.group()))
    else: # No word found
        ocurrences.append((0, 0, None))

# Example output 1:
# oc in this case is (start_index, end_index, word_found) for each sentence.
for index, oc in enumerate(ocurrences):
  print(sentences[index][:oc[1]])

# Example output 2"
for index, oc in enumerate(ocurrences):
  print(sentences[index][:oc[0]])

Example output 1:

im not george smith
how shall i call you george smith
we have detected a miyagi chung

Example output 2:

im not
how shall i call you
we have detected a

Find from a list of strings, from a list of strings

2 Answers2