2

I have a table containing full of movie genre, like this:

id | genre
---+----------------------------
1  | Drama, Romance, War
2  | Drama, Musical, Romance
3  | Adventure, Biography, Drama

Im looking for a way to get the most common word in the whole genre column and return it to a variable for further step in python.

I'm new to Python so I really don't know how to do it. Currently, I have these lines to connect to the database but don't know the way to get the most common word mentioned above.


    conn = mysql.connect()
    cursor = conn.cursor()
    most_common_word = cursor.execute()
    cursor.close()
    conn.close()

Phuc
  • 29
  • 5
  • Your challenge isn't "Python". You need to learn enough about *SQL* to query the table (using Python). STRONG SUGGESTION: 1) Glance at this tutorial: https://www.w3schools.com/sql/. *AT A MINIMUM*, familiarize yourself with `select`, `insert` and `select into`. 2) look at arheops [reply](https://stackoverflow.com/a/74749395/421195) below, and the [thread](https://stackoverflow.com/questions/17942508/sql-split-values-to-multiple-rows) he cited. 3) Create the "genre_words" table he suggested. 4) Once you've done all that, you can easily make the SQL ("select") query he suggested,. – paulsm4 Dec 10 '22 at 00:25
  • Sorry, but I'm new to MySQL too. This is the first time I use mysql. That's why I need help. – Phuc Dec 10 '22 at 00:28
  • This one task is not for beginners in sql. It is like the end of database course at university. Mainly because it requires some advanced logic to split row into multiple rows. If it is teaching task and it is not end of course, it is likely expected to be done as python task, not SQL. You can ask your teacher what is expected. – arheops Dec 10 '22 at 00:35
  • If this is an example for a class, that's one thing, but if it's for a real project, a more fundamental problem is `genre` is basically a table itself, so should be separated out so you can query it separately. That's what the relational part of relational databases is about. – John Bayko Dec 10 '22 at 00:47
  • @arheops That's the problem. This is the project in a class that database course is not a requirement. However the professor asking us to use database and we have to learn on our own. I will take database course next spring. This is also not a python class. The project is open for any language but my teammate decided to go with Python. I'm also not familiar with Python so I have to learned on my own. Basically the project is self-learning. Thanks for you guys help. – Phuc Dec 10 '22 at 00:51
  • @JohnBayko the genre is in a table alone because it's away for me to save the queries of user. I will find the most common word to suggest movie base on their past search. – Phuc Dec 10 '22 at 00:54
  • @Phuc If you need to rank the genres in the table, the relational way to do that is add a column with the rank as an integer (e.g. `id, rank, genre`). If rank isn't important, `id, genre` is enough, you can have several records with the same `id` if `genre` is different. – John Bayko Dec 10 '22 at 01:00

2 Answers2

2

First you need get list of words in each column. i.e create another table like

genre_words(genre_id bigint, word varchar(50))

For clues how to do that you may check this question:

SQL split values to multiple rows

You can do that as temporary table if you wish or use transaction and rollback. Which one to choose depend of your data size and PC on which DB running.

After that query will be really simple

select count(*) as c, word from genre_word group by word order by count(*) desc limit 1;

You also can do it using python, but if so it will not be a MySQL question at all. Need read table, create simple list of word+counter. If it new, add it, if exist - increase counter.

arheops
  • 15,544
  • 1
  • 21
  • 27
  • I would prefer a temporary table as I only want to count the most common word in the last few rows – Phuc Dec 10 '22 at 00:31
1
from collections import Counter

# Connect to database and get rows from table
rows = ...

# Create a list to hold all of the genres
genres = []

# Loop through each row and split the genre string by the comma character
# to create a list of individual genres
for row in rows:
    genre_list = row['genre'].split(',')
    genres.extend(genre_list)

# Use a Counter to count the number of occurrences of each genre
genre_counts = Counter(genres)

# Get the most common genre
most_common_genre = genre_counts.most_common(1)

# Print the most common genre
print(most_common_genre)
AlexK
  • 2,855
  • 9
  • 16
  • 27
abir sonet
  • 26
  • 4