I have a file. I am splitting it in a class. Also, I want to return top n years having the highest number of movie produced. And I will use lines attricute to get data.
import re
import collections
class movie_analyzer:
def __init__(self,s):
self.lines=open(s, encoding="latin-1").read().split('\n')
self.lines=[x.split('::') for x in self.lines]
def freq_by_year(self):
movies_years = [x[3] for x in self.lines]
c = collections.Counter(movies_years)
for movies_years, freq in c.most_common(3):
print(movies_years, ':', freq)
movie=movie_analyzer("modified.dat")
movie.freq_by_year()
It gives this error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-627-51913258f9e4> in <module>
----> 1 movie.freq_by_year()
<ipython-input-624-8dc663c0b252> in freq_by_year(self)
9 def freq_by_year(self):
10
---> 11 movies_years = [x[3] for x in self.lines]
12
13 c = collections.Counter(movies_years)
<ipython-input-624-8dc663c0b252> in <listcomp>(.0)
9 def freq_by_year(self):
10
---> 11 movies_years = [x[3] for x in self.lines]
12
13 c = collections.Counter(movies_years)
IndexError: list index out of range
Also, movie.lines looks like this:
[['1', 'Toy Story', "Animation|Children's|Comedy", '1995'],
['2', 'Jumanji', "Adventure|Children's|Fantasy", '1995'],
['3', 'Grumpier Old Men', 'Comedy|Romance', '1995'],
['4', 'Waiting to Exhale', 'Comedy|Drama', '1995'],
['5', 'Father of the Bride Part II', 'Comedy', '1995'],
['6', 'Heat', 'Action|Crime|Thriller', '1995'],
['7', 'Sabrina', 'Comedy|Romance', '1995'],
['8', 'Tom and Huck', "Adventure|Children's", '1995'],
['9', 'Sudden Death', 'Action', '1995'],
['10', 'GoldenEye', 'Action|Adventure|Thriller', '1995']]
.dat file looks like:
Movies = ["1::Toy Story::Animation|Children's|Comedy::1995\n",
"2::Jumanji::Adventure|Children's|Fantasy::1995\n",
'3::Grumpier Old Men::Comedy|Romance::1995\n',
'4::Waiting to Exhale::Comedy|Drama::1995\n',
'5::Father of the Bride Part II::Comedy::1995\n']