1

I have data like this :

                      end station name   User Type
0                   Carmine St & 6 Ave  Subscriber
1           South End Ave & Liberty St  Subscriber
2        Christopher St & Greenwich St  Subscriber
3             Lafayette St & Jersey St  Subscriber
4                     W 52 St & 11 Ave  Subscriber
5              E 53 St & Lexington Ave  Subscriber
6                      W 17 St & 8 Ave  Subscriber
7                  St Marks Pl & 2 Ave  Subscriber
8        Washington St & Gansevoort St    Customer
9               Barclay St & Church St  Subscriber
10       Washington St & Gansevoort St    Customer
11             E 37 St & Lexington Ave  Subscriber
12                     E 51 St & 1 Ave  Subscriber
13                     W 33 St & 7 Ave  Subscriber
14                 Pike St & Monroe St  Subscriber
15                E 24 St & Park Ave S  Subscriber
16                     1 Ave & E 15 St  Subscriber
17                  Broadway & W 32 St    Customer
18                     E 39 St & 3 Ave    Customer
19                    W 59 St & 10 Ave  Subscriber
20             Centre St & Chambers St  Subscriber
21                     9 Ave & W 45 St    Customer
22                     8 Ave & W 33 St  Subscriber
23             Suffolk St & Stanton St  Subscriber
24                    W 47 St & 10 Ave  Subscriber
25                     W 33 St & 7 Ave  Subscriber
26                     8 Ave & W 33 St  Subscriber
27                     1 Ave & E 15 St    Customer
28                     8 Ave & W 33 St  Subscriber
29                     W 33 St & 7 Ave  Subscriber
...                                ...         ...

I want to find five(5) most popular stations for Customers in descending order of popularity

Here is my code:

import pandas as pd
rides = pd.read_csv(csv_file_path, low_memory=False, parse_dates=True)
five_popular_station_end_trip = rides['end station name'].value_counts().head()

I can find most popular stations from one column but I have no idea about how to find it based on another column.

James Draper
  • 5,110
  • 5
  • 40
  • 59
RaTh0D
  • 323
  • 3
  • 19
  • Possible duplicate of [Select rows from a DataFrame based on values in a column in pandas](https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas) – James Draper Sep 06 '17 at 05:57
  • How does work my solution? – jezrael Sep 07 '17 at 05:12

1 Answers1

0

I think you need filter first by boolean indexing:

df1 = rides[rides['User Type'] == 'Customer']
five_popular_station_end_trip = df1['end station name'].value_counts().head()
print (five_popular_station_end_trip)
Washington St & Gansevoort St    2
Broadway & W 32 St               1
1 Ave & E 15 St                  1
E 39 St & 3 Ave                  1
9 Ave & W 45 St                  1
Name: end station name, dtype: int64

But if need all categories:

df = rides.groupby('User Type')['end station name'] \
          .apply(lambda x: x.value_counts().head()) \
          .reset_index(name='count') \
          .rename(columns={'level_1':'end station name'})
print (df)
    User Type               end station name  count
0    Customer  Washington St & Gansevoort St      2
1    Customer             Broadway & W 32 St      1
2    Customer                1 Ave & E 15 St      1
3    Customer                E 39 St & 3 Ave      1
4    Customer                9 Ave & W 45 St      1
5  Subscriber                8 Ave & W 33 St      3
6  Subscriber                W 33 St & 7 Ave      3
7  Subscriber               W 59 St & 10 Ave      1
8  Subscriber           E 24 St & Park Ave S      1
9  Subscriber                W 17 St & 8 Ave      1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252