I want to get 20 most common words from the descriptions of top 10 longest movies from data.csv, by using Python. So far, I got top 10 longest movies, however I am unable to get most common words from those specific movies, my code just gives most common words from whole data.csv itself. I tried Counter, Pandas, Numpy, Mathlib, but I have no idea how to make Python look exactly for most common words in the specific rows and column (description of movies) of the data table
My code:
import pandas as pd
import numpy as np
df = pd.read_csv("data.csv")
small_df = df[['title','duration_min','description']]
result_time = small_df.sort_values('duration_min', ascending=False)
print("TOP 10 LONGEST: ")
print(result_time.head(n=10))
most_common = pd.Series(' '.join(result_time['description']).lower().split()).value_counts()[:20]
print("20 Most common words from TOP 10 longest movies: ")
print(most_common)
My output:
TOP 10 LONGEST:
title duration_min description
6840 The School of Mischief 253.0 A high school teacher volunteers to transform ...
4482 No Longer kids 237.0 Hoping to prevent their father from skipping t...
3687 Lock Your Girls In 233.0 A widower believes he must marry off his three...
5100 Raya and Sakina 230.0 When robberies and murders targeting women swe...
5367 Sangam 228.0 Returning home from war after being assumed de...
3514 Lagaan 224.0 In 1890s India, an arrogant British commander ...
3190 Jodhaa Akbar 214.0 In 16th-century India, what begins as a strate...
6497 The Irishman 209.0 Hit man Frank Sheeran looks back at the secret...
3277 Kabhi Khushi Kabhie Gham 209.0 Years after his father disowns his adopted bro...
4476 No Direction Home: Bob Dylan 208.0 Featuring rare concert footage and interviews ...
20 Most common words from TOP 10 longest movies:
a 10134
the 7153
to 5653
and 5573
of 4691
in 3840
his 3005
with 1967
her 1803
an 1727
for 1558
on 1528
their 1468
when 1320
this 1240
from 1114
as 1050
is 988
by 894
after 865
dtype: int64
Here is the data table: https://www.dropbox.com/s/hxch4v08bkthvz1/data.csv?dl=1