-4

Let's say I have a list of sentences:

sent = ["Chocolate is loved by all.", 
        "Brazil is the biggest exporter of coffee.", 
        "Tokyo is the capital of Japan.",
        "chocolate is made from cocoa."]

I want to return all sentences that have the exact full word "chocolate", i.e. ["Chocolate is loved by all.", "chocolate is made from cocoa."]. If any sentence does not have the word "chocolate", it shouldn't be returned. The word "chocolateyyy" should not be returned either.

How can I do this in Python?

DeltaMarine101
  • 753
  • 2
  • 8
  • 24

4 Answers4

5

This will make sure that the search word is actually a full word, rather than a sub-word like 'chocolateyyy'. It's also not case sensitive, so 'Chocolate' = 'chocolate' despite the first letters being capitalised differently.

sent = ["Chocolate is loved by all.", "Brazil is the biggest exporter of coffee.",
        "Tokyo is the capital of Japan.","chocolate is made from cocoa.", "Chocolateyyy"]

search = "chocolate"

print([i for i in sent if search in i.lower().split()])

Here's a more expanded version for clarity with an explanation:

result = []
for i in sent: # Go through each string in sent
    lower = i.lower() # Make the string all lowercase
    split = lower.split(' ') # split the string on ' ', or spaces
                     # The default split() splits on whitespace anyway though
    if search in split: # if chocolate is an entire element in the split array
        result.append(i) # add it to results
print(result)

I hope this helps :)

DeltaMarine101
  • 753
  • 2
  • 8
  • 24
3

You need:

filtered_sent = [i for i in sent if 'chocolate' in i.lower()]

Output

['Chocolate is loved by all.', 'chocolate is made from cocoa.']
Sociopath
  • 13,068
  • 19
  • 47
  • 75
2

From this question, you want some of the methods in the re library. In particular:

\b Matches the empty string, but only at the beginning or end of a word.

You can therefore search for "chocolate" using re.search(r'\bchocolate\b', your_sentence, re.IGNORECASE).

The rest of the solution is just to iterate through your list of sentences and return a sublist that matches your target string.

HenryLockwood
  • 215
  • 1
  • 8
1

You can use the regular expression library in python:

import re

sent = ["Chocolate is loved by all.", 
        "Brazil is the biggest exporter of coffee.", 
        "Tokyo is the capital of Japan.",
        "chocolate is made from cocoa."]
match_string = "chocolate"
matched_sent = [s for s in sent if len(re.findall(r"\bchocolate\b", s, re.IGNORECASE)) > 0]
print (matched_sent)    
Y.Wang
  • 153
  • 5