You can do this with some boolean indexing. Consider the following dataframe:
import pandas as pd
import numpy as np
np.random.seed(3)
df = pd.DataFrame({
"season": np.repeat([2017, 2018, 2019], 5),
"team_1": np.random.randint(5, size=15),
"team_2": np.random.randint(5, size=15),
"toss_winner": np.random.randint(5, size=15)
})
print(df)
season team_1 team_2 toss_winner
0 2017 2 4 3
1 2017 0 0 0
2 2017 1 2 3
3 2017 3 1 1
4 2017 0 2 3
5 2018 0 2 0
6 2018 0 2 4
7 2018 3 4 1
8 2018 2 1 4
9 2018 3 3 4
10 2019 1 2 0
11 2019 1 0 0
12 2019 2 1 1
13 2019 0 2 2
14 2019 4 0 4
If you're specifically looking for how many times either team scored 1 in the year 2017, you can do that by using some fun boolean indexing. First, I'm seeing where "team_1" or "team_2" was 1 which produces a 2d boolean array, then I use .any(axis=1)
to give me a 1d boolean array with the same length as my dataframe where it's true if either "team_1" or "team_2" is 1. Next I'm creating a boolean array for my dataframe wherever "season" == 2017. We can combine these to create a specific subset of the 2017 season wherever either was 1.
team_1_played = (df[["team_1", "team_2"]] == 1).any(axis=1)
season_2017 = df["season"] == 2017
print(df.loc[season_2017 & team_1_played ])
season team_1 team_2 toss_winner
2 2017 1 2 3
3 2017 3 1 1
We can also combine and sum
the boolean arrays to see how many games happened where a team 1 played in 2017. Note that this is the same as the length of the dataframe subset from above.
print((season_2017 & team_1_played ).sum())
2
You can also count the number of times each value appears for either team with a groupby/apply approach.
out = (df.groupby("season")[["team_1", "team_2"]]
.apply(lambda group: group.apply(pd.Series.value_counts))
.fillna(0))
print(out)
team_1 team_2
season
2017 0 2.0 1.0
1 1.0 1.0
2 1.0 2.0
3 1.0 0.0
4 0.0 1.0
2018 0 2.0 0.0
1 0.0 1.0
2 1.0 2.0
3 2.0 1.0
4 0.0 1.0
2019 0 1.0 2.0
1 2.0 1.0
2 1.0 2.0
4 1.0 0.0
As an example, in this dataframe, you can see that in 2017, team 2 played as "team_1" 1.0 times, whereas they played as "team_2" 2.0 times.