1

I have a list of names:

lst = ['Albert', 'Carl', 'Julian', 'Mary']

and I have a DF:

target     id      name
  A       100     Albert
  A       110     Albert
  B       200     Carl
  D       500     Mary
  E       235     Mary

I want to make another dataframe counting how many id per name in lst:

lst_names   Count
Albert       2
Carl         1
Julian       0
Mary         2

What's the most efficient way to do this considering the list of names has 12k unique names on it?

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
pouchewar
  • 399
  • 1
  • 10

2 Answers2

2

Check with value_counts

pd.Categorical(df['name'],lst).value_counts()
Out[894]: 
Albert    2
Carl      1
Julian    0
Mary      2
dtype: int64

Or

df['name'].value_counts().reindex(lst,fill_value=0)
Out[896]: 
Albert    2
Carl      1
Julian    0
Mary      2
Name: name, dtype: int64
BENY
  • 317,841
  • 20
  • 164
  • 234
2

You can use value_counts, and then create an empty Series with lst as the index, and then add them together, filling NaN with 0:

(df['name'].value_counts() + pd.Series(index=lst, dtype=int)).fillna(0).astype(int)

Output:

>>> df
Albert    2
Carl      1
Julian    0
Mary      2
Name: count, dtype: int64