5

I have a list :

citylist = ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Miami']

and a pandas Dataframe df1 with these values

first   last            city                                email
John    Travis          New York                            a@email.com
Jim     Perterson       San Franciso, Los Angeles           b@email.com
Nancy   Travis          Chicago                             b1@email.com
Jake    Templeton       Los Angeles                         b3@email.com
John    Myers           New York                            b4@email.com
Peter   Johnson         San Franciso, Chicago               b5@email.com
Aby     Peters          Los Angeles                         b6@email.com
Amy     Thomas          San Franciso                        b7@email.com
Jessica Thompson        Los Angeles, Chicago, New York      b8@email.com

I want to count the number of times each city from citylist occurs in the dataframe column 'city':

New York        3       
San Francisco   3
Los Angeles     4
Chicago         3
Miami           0

Currently I have

dftest = df1.groupby(by='city', as_index=False).agg({'id': pd.Series.nunique})

and it ends counting "Los Angeles, Chicago, New York" as 1 unique value

Is there any way to get counts as I have show above? Thanks

user14262559
  • 165
  • 8

2 Answers2

4

Try this:

Fix data first:

df1['city'] = df1['city'].str.replace('Franciso', 'Francisco')

Use this:

(df1['city'].str.split(', ')
            .explode()
            .value_counts(sort=False)
            .reindex(citylist, fill_value=0))

Output:

New York         3
San Francisco    3
Los Angeles      4
Chicago          3
Miami            0
Name: city, dtype: int64
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
2

You can use Series.str.count:

pd.Series([df['city'].str.count(c).sum() for c in citylist], index=citylist)

Another more efficient approach as suggested by @ScottBoston

pd.Series({c:sum(c in i for i in df['city']) for c in citylist})

New York         3
San Francisco    0
Los Angeles      4
Chicago          3
Miami            0
dtype: int64
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53