count number of elements in column pandas based on another column's value

Question

I am trying to count the number of times a value has occurred on the based of another column value in pandas dataframe.Here is the data frame:

Basically, I want to count the number of times 1 occur in team 1 or team 2 in the season 2017(based on year).Till now here is my code

total_wins={}
teams=matches['team1'].unique()
for i in teams:
    total_matches[i]=len(matches[matches['winner'] == i])+len(matches[matches['team2'] == i])

Please provide your sample data in the text of your question, not as a picture or link. See [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). Please also provide a sample of your expected output — G. Anderson, Sep 30 '20 at 17:03
For what I understood, you might wanna do `total_matches[i]=len(matches[matches['team1'] == i])+len(matches[matches['team2'] == i])` instead? (use team1 column instead of winner) — mgmussi, Sep 30 '20 at 17:14
@mgmussi no that won't help me as I want to calculate the total number of matches played in a particular season — maria_g, Sep 30 '20 at 17:23
So you want to filter the season (2017, 2018, 2019...) and for each, count the number of times Team '1' played as team1 or team2, is that right? You don't care about the winner in this particular question, right? — mgmussi, Sep 30 '20 at 17:31

score 2 · Answer 1 · answered Sep 30 '20 at 17:29

Seems like a job for groupby!

team_id = 1

result = (
    matches
        .groupby("season")
        .apply(
            lambda group: group.team1.eq(team_id).sum() + group.team2.eq(team_id).sum()
        )

In words:

look at the subset of the frame corresponding to each unique value of 'season'
apply to this subset a function that checks where the team1 and team2 column equal team_id ...
... then sums the resulting boolean (True/False) series (= number of Trues) ...
... then adds the two together

Because you provided an image of the DataFrame rather than something I could select, I can't produce example output for your precise case, but you should end up with a Series indexed by season and containing the number of matches the team with ID 1 played that season, just as you asked.

From this you can get the answer for the 2017 season with result["2017"]. You can then loop over the team IDs to get results for the other teams.

score 1 · Accepted Answer · edited Sep 30 '20 at 17:36

You can do this with some boolean indexing. Consider the following dataframe:

import pandas as pd
import numpy as np

np.random.seed(3)

df = pd.DataFrame({
    "season": np.repeat([2017, 2018, 2019], 5),
    "team_1": np.random.randint(5, size=15),
    "team_2": np.random.randint(5, size=15),
    "toss_winner": np.random.randint(5, size=15)
})

print(df)
    season  team_1  team_2  toss_winner
0     2017       2       4            3
1     2017       0       0            0
2     2017       1       2            3
3     2017       3       1            1
4     2017       0       2            3
5     2018       0       2            0
6     2018       0       2            4
7     2018       3       4            1
8     2018       2       1            4
9     2018       3       3            4
10    2019       1       2            0
11    2019       1       0            0
12    2019       2       1            1
13    2019       0       2            2
14    2019       4       0            4

If you're specifically looking for how many times either team scored 1 in the year 2017, you can do that by using some fun boolean indexing. First, I'm seeing where "team_1" or "team_2" was 1 which produces a 2d boolean array, then I use .any(axis=1) to give me a 1d boolean array with the same length as my dataframe where it's true if either "team_1" or "team_2" is 1. Next I'm creating a boolean array for my dataframe wherever "season" == 2017. We can combine these to create a specific subset of the 2017 season wherever either was 1.

team_1_played = (df[["team_1", "team_2"]] == 1).any(axis=1)
season_2017 = df["season"] == 2017

print(df.loc[season_2017 & team_1_played ])
   season  team_1  team_2  toss_winner
2    2017       1       2            3
3    2017       3       1            1

We can also combine and sum the boolean arrays to see how many games happened where a team 1 played in 2017. Note that this is the same as the length of the dataframe subset from above.

print((season_2017 & team_1_played ).sum())
2

You can also count the number of times each value appears for either team with a groupby/apply approach.

out = (df.groupby("season")[["team_1", "team_2"]]
       .apply(lambda group: group.apply(pd.Series.value_counts))
       .fillna(0))

print(out)
          team_1  team_2
season                  
2017   0     2.0     1.0
       1     1.0     1.0
       2     1.0     2.0
       3     1.0     0.0
       4     0.0     1.0
2018   0     2.0     0.0
       1     0.0     1.0
       2     1.0     2.0
       3     2.0     1.0
       4     0.0     1.0
2019   0     1.0     2.0
       1     2.0     1.0
       2     1.0     2.0
       4     1.0     0.0

As an example, in this dataframe, you can see that in 2017, team 2 played as "team_1" 1.0 times, whereas they played as "team_2" 2.0 times.

count number of elements in column pandas based on another column's value

2 Answers2