0

so im trying to use pandas instead of a for loop to count the number of movies in a given range of year. Assume by data frame has 2 columns and 'year' is the column name at column 2

I solved it using a for loop but how would i do it by using pandas only?

def movie_made(beginning, end):
   movie =  pd.read_scv('title.csv')
   count = 0
   for i in move['year']:
      if beginning <= i and end <=i:
         count = count + 1
   return count

This allows me to count all the movies within a given year, but im wondering if there is a better way using the pandas infrastructure for reading from a data base.

clumbzy1
  • 105
  • 9

3 Answers3

0

You could do something like this:

import pandas as pd

df = pd.DataFrame(data=list(range(1980, 2001)), columns=['year'])
beginning, end = 1998, 2000


def movie_made(df, beginning, end):
    return len(df[(beginning <= df['year']) & (df['year'] <= end)].index)


print(movie_made(df, beginning, end))

Output

3

Further

  1. How do I get the row count of a Pandas dataframe?
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
0

Given a sample dataframe like this:

    movie   year
0      A    2016
1      B    2017
2      C    2018

you can set the year as index and use loc to get a year range then get count using shape

movie.set_index('year').loc[[2016,2017]].shape[0]
Zito Relova
  • 1,041
  • 2
  • 13
  • 32
  • set_index is a function or am i just calling the column year? – clumbzy1 Nov 05 '18 at 01:42
  • set_index is a function to set a specific column of the dataframe as the index so that you can filter it using loc https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html – Zito Relova Nov 05 '18 at 01:47
0

Yet another approach:

Data and data types

print(df)
  movie  year
0 xxxxx  2010
1 yyyyy  2011
2 zzzzz  2012

print(df.dtypes)
movie    object
year     object
dtype: object

Filter

startdate = 2010
enddate = 2011
years = range(startdate, enddate+1)
df_filtered = df[pd.to_datetime(df.year).dt.year.isin(years)]
print(df_filtered)
print('Number of rows in filtered DF = {}' .format(len(dff)))

Output

  movie  year
0 xxxxx  2010
1 yyyyy  2011
Number of rows in filtered DF = 2
edesz
  • 11,756
  • 22
  • 75
  • 123