Searching over a list of individual sentences by a specific term in Python

Question

I have a list of terms in Python that look like this.

Fruit
apple
banana
grape
orange

As well as a list of individual sentences that may contain the name of that fruit in a data frame. Something similar to this:

Customer     Review
1            ['the banana was delicious','he called the firetruck','I had only half an orange']
2            ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons']
3            ['It could use some more cheese','the grape and orange was sour']

And I want to take the sentences in the review column, match them with the fruit mentioned in the text and print out a data frame of that as a final result. So, something like this:

Fruit     Review
apple     ['the banana was delicious','I liked the banana']
banana    ['there was a worm in my apple']
grape     ['the grape and orange was sour']
orange    ['the grape and orange was sour','I had only half an orange']

Hoe could I go about doing this?

How is this data stored? How are you keeping track of the reviews? You say its a list, but it seems that you're mapping the customer number to an array in a dictionary? — Marcel M, Jul 28 '20 at 16:34
The data is stored in separate data frames. Customer number would best be interpreted as an ID to that specific customer. — Alokin, Jul 28 '20 at 16:40

M Z · Answer 1 · 2020-07-28T17:02:25.780

1

You can hold a dictionary, and then search by word

# your fruits list
fruits = ["apple", "banana", "grape", "orange"]

reviews = [['the banana was delicious','he called the firetruck','I had only half an orange'], ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'], ['It could use some more cheese','the grape and orange was sour']]

# Initialize the dictionary, make each fruit a key
fruitReviews = {fruit.lower():[] for fruit in fruits}

# for each review, if a word in the review is a fruit, add it to that
# fruit's reviews list
for reviewer in reviews
    for review in reviewer:
        for word in review.split():
            fruitReview = fruitReviews.get(word.lower(), None)
            if fruitReview is not None:
                fruitReview.append(review)
"""
result:
{
  "orange": [
    "I had only half an orange", 
    "the grape and orange was sour"
  ], 
  "grape": [
    "the grape and orange was sour"
  ], 
  "apple": [
    "there was a worm in my apple"
  ], 
  "banana": [
    "the banana was delicious", 
    "I liked the banana"
  ]
}
"""

edited Jul 28 '20 at 17:02

answered Jul 28 '20 at 16:39

M Z

4,571
2
13
27

For the second line in your for loop: for word in review.split(): I'm getting the error: 'list' object has no attribute 'split' Would you know why this would come up? – Alokin Jul 28 '20 at 16:52
It comes off as a pandas.core.series.Series If i make it into a list using reviews.to_list(), then it shows up as a list – Alokin Jul 28 '20 at 16:57
yes but is it a list of all strings or a list of more lists? – M Z Jul 28 '20 at 16:58
it's a list of more lists – Alokin Jul 28 '20 at 17:02

Marcel M · Accepted Answer · 2021-09-14T23:50:42.230

While the exact answer depends on how you're storing the data, I think the methodology is the same:

Create and store an empty list for every fruit name to store its reviews
For each review, check each of the fruits to see if they appear. If a fruit appears in the comment at all, add the review to that fruit's list

Here's an example of what that would look like:

#The list of fruits
fruits = ['apple', 'banana', 'grape', 'orange']

#The collection of reviews (based on the way it was presented, I'm assuming it was in a dictionary)
reviews = {
    '1':['the banana was delicious','he called the firetruck','I had only half an orange'],
    '2':['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'],
    '3':['It could use some more cheese','the grape and orange was sour']
}

fruitDictionary = {}
#1. Create and store an empty list for every fruit name to store its reviews
for fruit in fruits:
    fruitDictionary[fruit] = []
for customerReviews in reviews.values():
    #2. For each review,...
    for review in customerReviews:
        #...check each of the fruits to see if they appear.
        for fruit in fruits: 
            # If a fruit appears in the comment at all,...
            if fruit.lower() in review: 
                #...add the review to that fruit's list
                fruitDictionary[fruit].append(review)

This differs from previous answers in that sentences like "I enjoyed this grape. I thought the grape was very juicy" are only added to the grape section once.

If your data is stored as a list of lists, the process is very similar:

#The list of fruits
fruits = ['apple', 'banana', 'grape', 'orange']

#The collection of reviews
reviews = [
    ['the banana was delicious','he called the firetruck','I had only half an orange'],
    ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'],
    ['It could use some more cheese','the grape and orange was sour']
]

fruitDictionary = {}
#1. Create and store an empty list for every fruit name to store its reviews
for fruit in fruits:
    fruitDictionary[fruit] = []
for customerReviews in reviews:
    #2. For each review,...
    for review in customerReviews:
        #...check each of the fruits to see if they appear.
        for fruit in fruits: 
            # If a fruit appears in the comment at all,...
            if fruit.lower() in review: 
                #...add the review to that fruit's list
                fruitDictionary[fruit].append(review)

It's also significantly slower `O(n^4)` over the already bad `O(n^3)` — M Z, Jul 28 '20 at 17:14
How would you do this method if reviews was not a dictionary, but rather a list of list where you only see comments of the reviews. I.E. : reviews = { ['the banana was delicious','he called the firetruck','I had only half an orange'], ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'], ['It could use some more cheese','the grape and orange was sour'] } — Alokin, Jul 28 '20 at 19:38

Prayson W. Daniel · Answer 3 · 2020-07-28T17:49:39.190

You can use the .explode function to expand the reviews then use sets to find intersectio

import pandas as pd

fruits = pd.DataFrame({'Fruit':'apple banana grape orange'.split()})

reviews =pd.DataFrame({'Customer':[1,2,3],
 'Review':[['the banana was delicious','he called the firetruck','I had only half an orange'],
           ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'],
           ['It could use some more cheese','the grape and orange was sour'],
           ]})

# review per row
explode_reviews = reviews.explode('Review')

# create a set
fruits_set = set(fruits['Fruit'].tolist())

# find intersection 
explode_reviews['Fruit'] = explode_reviews['Review'].apply(lambda x: ' '.join(set(x.split()).intersection(fruits_set)))

print(explode_reviews)

Results: enter image description here

If you don’t want to explode your data, you can just do:

# ...

flatten = lambda l: [item for sublist in l for item in sublist]


reviews['Fruit'] = reviews['Review'].apply(lambda x: flatten([set(i.split()).intersection(fruits_set) for i in x]))

Results: enter image description here

Credit for flatten code

Searching over a list of individual sentences by a specific term in Python

3 Answers3