I am following an pandas instruction from How do I filter rows of a pandas DataFrame by column value? - YouTube which teach how to show movies with a duration of at least 200 minutes
The data:
#+BEGIN_SRC python :results output :session
# read a dataset of top-rated IMDb movies into a DataFrame
movies = pd.read_csv('../data/imdbratings.csv')
print(movies.head())
print('\n', movies.shape)
#+END_SRC
#+RESULTS:
: star_rating title content_rating genre duration actors_list
: 0 9.3 The Shawshank Redemption R Crime 142 [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
: 1 9.2 The Godfather R Crime 175 [u'Marlon Brando', u'Al Pacino', u'James Caan']
: 2 9.1 The Godfather: Part II R Crime 200 [u'Al Pacino', u'Robert De Niro', u'Robert Duv...
: 3 9.0 The Dark Knight PG-13 Action 152 [u'Christian Bale', u'Heath Ledger', u'Aaron E...
: 4 8.9 Pulp Fiction R Crime 154 [u'John Travolta', u'Uma Thurman', u'Samuel L....
:
: (979, 6)
Then construct a booleans to collect values whose duration greater than 200 minutes
#+BEGIN_SRC python :results output :session
# create a list in which each element refers to a DataFrame row: True if the row satisfies the condition, False otherwise
booleans = [True if length >= 200 else False for length in movies.duration]
print(len(booleans), booleans[:5])
#+END_SRC
#+RESULTS:
: 979 [False, False, True, False, False]
Please note that booleans only collected the bool values. However, magic hours follows
#+BEGIN_SRC python :results output :session
# convert the list to a Series
is_long = pd.Series(booleans)
print(is_long.head())
print(movies[is_long])
#+END_SRC
#+RESULTS:
#+begin_example
0 False
1 False
2 True
3 False
4 False
dtype: bool
star_rating title ... duration actors_list
2 9.1 The Godfather: Part II ... 200 [u'Al Pacino', u'Robert De Niro', u'Robert Duv...
7 8.9 The Lord of the Rings: The Return of the King ... 201 [u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...
17 8.7 Seven Samurai ... 207 [u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K...
78 8.4 Once Upon a Time in America ... 229 [u'Robert De Niro', u'James Woods', u'Elizabet...
85 8.4 Lawrence of Arabia ... 216 [u"Peter O'Toole", u'Alec Guinness', u'Anthony...
142 8.3 Lagaan: Once Upon a Time in India ... 224 [u'Aamir Khan', u'Gracy Singh', u'Rachel Shell...
157 8.2 Gone with the Wind ... 238 [u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...
204 8.1 Ben-Hur ... 212 [u'Charlton Heston', u'Jack Hawkins', u'Stephe...
445 7.9 The Ten Commandments ... 220 [u'Charlton Heston', u'Yul Brynner', u'Anne Ba...
476 7.8 Hamlet ... 242 [u'Kenneth Branagh', u'Julie Christie', u'Dere...
630 7.7 Malcolm X ... 202 [u'Denzel Washington', u'Angela Bassett', u'De...
767 7.6 It's a Mad, Mad, Mad, Mad World ... 205 [u'Spencer Tracy', u'Milton Berle', u'Ethel Me...
[12 rows x 6 columns]
#+end_example
The passed-in booleans contains solely but the pure bool values,
How could movies[is_long]
know the values are result from comparing with 200 minutes.