0

I am following an pandas instruction from How do I filter rows of a pandas DataFrame by column value? - YouTube which teach how to show movies with a duration of at least 200 minutes

The data:

#+BEGIN_SRC  python :results output  :session
# read a dataset of top-rated IMDb movies into a DataFrame
movies = pd.read_csv('../data/imdbratings.csv')
print(movies.head())
print('\n', movies.shape)
#+END_SRC

#+RESULTS:
: star_rating                     title              content_rating   genre  duration                                        actors_list
: 0          9.3  The Shawshank Redemption              R   Crime       142  [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
: 1          9.2             The Godfather                                 R   Crime       175    [u'Marlon Brando', u'Al Pacino', u'James Caan']
: 2          9.1    The Godfather: Part II                            R   Crime       200  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...
: 3          9.0           The Dark Knight                       PG-13  Action       152  [u'Christian Bale', u'Heath Ledger', u'Aaron E...
: 4          8.9              Pulp Fiction                                      R   Crime       154  [u'John Travolta', u'Uma Thurman', u'Samuel L....
:
:  (979, 6)

Then construct a booleans to collect values whose duration greater than 200 minutes

#+BEGIN_SRC  python :results output  :session
# create a list in which each element refers to a DataFrame row: True if the row satisfies the condition, False otherwise
booleans = [True if length >= 200 else False for length in movies.duration]
print(len(booleans), booleans[:5])
#+END_SRC

#+RESULTS:
: 979 [False, False, True, False, False]

Please note that booleans only collected the bool values. However, magic hours follows

#+BEGIN_SRC  python :results output  :session
# convert the list to a Series
is_long = pd.Series(booleans)
print(is_long.head())
print(movies[is_long])
#+END_SRC

#+RESULTS:
#+begin_example
0    False
1    False
2     True
3    False
4    False
dtype: bool
     star_rating                                          title  ...                                    duration                                        actors_list
2            9.1                         The Godfather: Part II  ...                              200  [u'Al Pacino', u'Robert De Niro', u'Robert Duv...
7            8.9  The Lord of the Rings: The Return of the King  ...      201  [u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...
17           8.7                                  Seven Samurai  ...                                  207  [u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K...
78           8.4                    Once Upon a Time in America  ...                  229  [u'Robert De Niro', u'James Woods', u'Elizabet...
85           8.4                             Lawrence of Arabia  ...                              216  [u"Peter O'Toole", u'Alec Guinness', u'Anthony...
142          8.3              Lagaan: Once Upon a Time in India  ...           224  [u'Aamir Khan', u'Gracy Singh', u'Rachel Shell...
157          8.2                             Gone with the Wind  ...                           238  [u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...
204          8.1                                        Ben-Hur  ...                                       212  [u'Charlton Heston', u'Jack Hawkins', u'Stephe...
445          7.9                           The Ten Commandments  ...                220  [u'Charlton Heston', u'Yul Brynner', u'Anne Ba...
476          7.8                                         Hamlet  ...                                        242  [u'Kenneth Branagh', u'Julie Christie', u'Dere...
630          7.7                                      Malcolm X  ...                                    202  [u'Denzel Washington', u'Angela Bassett', u'De...
767          7.6                It's a Mad, Mad, Mad, Mad World  ...            205  [u'Spencer Tracy', u'Milton Berle', u'Ethel Me...

[12 rows x 6 columns]
#+end_example

The passed-in booleans contains solely but the pure bool values,

How could movies[is_long] know the values are result from comparing with 200 minutes.

AbstProcDo
  • 19,953
  • 19
  • 81
  • 138
  • Use `movies[movies.duration >= 200]` – jezrael Jun 22 '19 at 06:24
  • 2
    I think it should be dupe - [boolean indexing](https://stackoverflow.com/q/17071871) – jezrael Jun 22 '19 at 06:25
  • Not sure if understand `How could movies[is_long] know the values are result from comparing with 200 minutes.` - `is_long` is created from `booleans` by list comprehension, so here working nice, but pd.Series should be omit, because if different indexing it should raise error or filter badly - here use only `print(movies[booleans])`. – jezrael Jun 22 '19 at 06:30

0 Answers0