0

I have a dataset like this:

                      end station name   User Type
0                   Carmine St & 6 Ave  Subscriber
1           South End Ave & Liberty St  Subscriber
2        Christopher St & Greenwich St  Subscriber
3             Lafayette St & Jersey St  Subscriber
4                     W 52 St & 11 Ave  Subscriber
5              E 53 St & Lexington Ave  Subscriber
6                      W 17 St & 8 Ave  Subscriber
7                  St Marks Pl & 2 Ave  Subscriber
8    Grand Army Plaza & Central Park S    Customer
9               Barclay St & Church St  Subscriber
10       Washington St & Gansevoort St    Customer
11             E 37 St & Lexington Ave  Subscriber
12                     E 51 St & 1 Ave  Subscriber
13                     W 33 St & 7 Ave  Subscriber
14                 Pike St & Monroe St  Subscriber
15                E 24 St & Park Ave S  Subscriber
16                     1 Ave & E 15 St  Subscriber
17              Central Park S & 6 Ave    Customer
18                     E 39 St & 3 Ave    Customer
19                    W 59 St & 10 Ave  Subscriber
20              Central Park S & 6 Ave  Subscriber
21                     9 Ave & W 45 St    Customer
22                     8 Ave & W 33 St  Subscriber
23             Suffolk St & Stanton St  Subscriber
24                    W 47 St & 10 Ave  Subscriber
25                     W 33 St & 7 Ave  Subscriber
26                     8 Ave & W 33 St  Subscriber
27                     1 Ave & E 15 St    Customer
28                     8 Ave & W 33 St  Subscriber
29                     W 33 St & 7 Ave  Subscriber
...                                ...         ...
1085646               10 Ave & W 28 St  Subscriber
1085647         Central Park S & 6 Ave    Customer
1085648                W 52 St & 9 Ave  Subscriber
1085649         Perry St & Bleecker St  Subscriber
1085650        Allen St & E Houston St  Subscriber
1085651         Norfolk St & Broome St  Subscriber
1085652               11 Ave & W 27 St  Subscriber
1085653           John St & William St  Subscriber
1085654               W 43 St & 10 Ave    Customer
1085655       Cleveland Pl & Spring St  Subscriber
1085656   MacDougal St & Washington Sq    Customer
1085657       Elizabeth St & Hester St  Subscriber
1085658            St Marks Pl & 1 Ave  Subscriber
1085659                E 33 St & 2 Ave  Subscriber
1085660               W 56 St & 10 Ave  Subscriber
1085661  Brooklyn Bridge Park - Pier 2    Customer
1085662                W 21 St & 6 Ave  Subscriber
1085663            Bank St & Hudson St  Subscriber
1085664          Canal St & Rutgers St  Subscriber
1085665               10 Ave & W 28 St  Subscriber
1085666                9 Ave & W 16 St  Subscriber
1085667         Carlton Ave & Park Ave    Customer
1085668        Allen St & E Houston St  Subscriber
1085669        Allen St & E Houston St  Subscriber
1085670                8 Ave & W 31 St  Subscriber
1085671                9 Ave & W 14 St  Subscriber
1085672                E 25 St & 2 Ave  Subscriber
1085673                9 Ave & W 14 St    Customer
1085674              E 7 St & Avenue A  Subscriber
1085675        Allen St & Rivington St  Subscriber

Question

How many Customer rides end at a Central Park bike sharing station? Function a3() should return a Series object indexed by station names in descending order of popularity.

NOTE: Many station names indicate that the station is located at the intersection of two streets: E 17 St & Broadway or Broadway & E 14 St. Your answer should include any end station whose name contains Central Park.

My Code:

def a3(rides):
    df1 = rides[rides['User Type'] == 'Customer']
    df1 = rides['end station name'].str.contains('Central Park')
    central_park_total_rides = df1.value_counts().head()
    return central_park_total_rides

print a3(rides) # where 'rides' is dataset

Output:

False    1070953
True       14723
Name: end station name, dtype: int64

instead of series of station name in decending order.

Where did I make a mistake? Any better way to doing this?

RaTh0D
  • 323
  • 3
  • 19
  • `rides['end station name'].str.contains('Central Park')` returns boolean values – OneCricketeer Sep 07 '17 at 04:16
  • Possible duplicate of [How to filter rows containing a string pattern from a Pandas dataframe](https://stackoverflow.com/questions/27975069/how-to-filter-rows-containing-a-string-pattern-from-a-pandas-dataframe) – OneCricketeer Sep 07 '17 at 04:17
  • I want output **Series object indexed by station names in descending order of popularity** as described here. If `df1.str.contains()` not work then what are the other options – RaTh0D Sep 07 '17 at 04:22
  • You need to use `df1 = df1[df1['end station name'].str.contains('Central Park')]`... You aren't filtering the dataframe, you are getting the boolean array of which items contain the string – OneCricketeer Sep 07 '17 at 04:29

2 Answers2

1

This will return the value counts in descending order:

df1 = rides[rides['User Type'] == 'Customer']
mask = df1['end station name'].str.contains('Central Park')
df1.loc[mask, 'end station name'].value_counts()

First off you referenced rides, not df1, in rides['end station name'].str.contains('Central Park').

df1['end station name'].str.contains('Central Park') will return boolean values, so you can use this as a mask on the df. Then you use value_counts().

user3212593
  • 496
  • 2
  • 8
0

Better is chain conditions with & (and) as filter twice:

mask = rides[rides[('User Type'] == 'Customer') & 
             rides['end station name'].str.contains('Central Park')]
rides.loc[mask, 'end station name'].value_counts()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252