3

The data has 2 columns as title and genre. So I am trying to give the title value of the row which matched by genre with user input.

Here what i try:

#CSV READ & GENRE-TITLE
data = pd.read_csv("data.csv")
df_title = data["title"]
df_genre = data["genre"]

#TOKENIZE
tokenized_genre = [word_tokenize(i) for i in df_genre]
tokenized_title = [word_tokenize(i) for i in df_title]

#INPUT-DATA MATCH
search = {e.lower() for l in tokenized_genre  for e in l}
choice = input('Please enter a word = ')

while choice != "exit":
    if choice.lower() in search:
        print(data.loc[data.genre == {choice}, 'title'])
    else:
        print("The movie of the genre doesn't exist")
    choice = input("Please enter a word = ")

But the result is: Series([], Name: title, dtype: object)

How can i solve it ?

Edit: Data samples for title

0                              The Story of the Kelly Gang
1                                           Den sorte drøm
2                                                Cleopatra
3                                                L'Inferno
4        From the Manger to the Cross; or, Jesus of 
...

And for genres:

0          Biography, Crime, Drama
1                            Drama
2                   Drama, History
3        Adventure, Drama, Fantasy
4                 Biography, Drama
...
ilkebirsen
  • 55
  • 5
  • Could you provide son sample data in order for us to provide you accurate help? I guess you can keep one and only one [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) here. – swiss_knight Apr 04 '20 at 10:27
  • @s.k thanks for reminding, edited the post and added the samples. – ilkebirsen Apr 04 '20 at 10:55

1 Answers1

1

One proposal only based on Pandas

I would suggest something like this (please adapt to your situation upon your wishes, it's only some general guidelines and hints from where you can start):

import pandas as pd

# Warning: there are coma and semi-column in some of the films titles,
# so I had to use an other separator when exporting data to CSV, 
# here I decided to chose the vertical bar '|' as you can see)

#CSV READ & GENRE-TITLE
data = pd.read_csv("data.csv", sep="|")

choice = input('Please enter a word = ')

while choice != "exit":
    choice = choice.lower()
    for index, row in data.iterrows():
        if choice in row['genre'].lower():
            print(row['title'])
        else:
            print(("The movie of the genre {} doesn't exist").format(choice))
    choice = input("Please enter a word = ")


Edit

To generate a random number:

from random import randint
i = randint(0, len(data))

Then, use i as the index to search within your DataFrame.
I let you play around with this.



Useful links

Does Python have a string 'contains' substring method?
How to iterate over rows in a DataFrame in Pandas?

swiss_knight
  • 5,787
  • 8
  • 50
  • 92
  • thanks, it helped, just a little modified and works now but a little detail left, i need it to give a random one from `row[title]`, now it lists all of them in the file. any suggestion do you have? – ilkebirsen Apr 04 '20 at 18:21
  • 1
    Instead of printing the results, you can store them in a list or whatever other structure you want, and then, chose a random number `i` between 0 and the length of that structure to print the i-th element only. Doc: https://docs.python.org/3/library/random.html – swiss_knight Apr 04 '20 at 18:34
  • thanks again, couldn't handle that random number stuff but i handled with `result = [row['title']]` , `print(random.sample(result, 1)[0])` – ilkebirsen Apr 05 '20 at 10:57
  • I've added the random number generation. I let you try on your own, it's not that hard from now. – swiss_knight Apr 05 '20 at 11:15
  • ah i see your point now, but it should suggest only one 'title' / item randomly from list, i misrepresented. so it would be a useless to choose a random number to show right? or i misunderstood again? :D – ilkebirsen Apr 05 '20 at 12:13
  • Let say your list have 150 films. Upon them, 14 are of genre 'drama'. Keep this value in a variable. Then pick a random number between 0 and 14. Let say that 6 pop out. Finally, print only the 6-th title that have match your 'genre' search over the 14 that have been retrieved. That's how I understood it. :-) – swiss_knight Apr 05 '20 at 13:18
  • well you understood it correctly, but didn't think with that way though, okay now i get it. i'll keep it for alternative way to run it, it seems works nice for now. thank you for the both ways. – ilkebirsen Apr 05 '20 at 13:46