-2

Calculate the total number of matches played by teams, When it is present in both the HomeTeam and AwayTeam columns using pandas/pyspark.

I thought of using join. First, I groupby() the HomeTeam and get the result of the number of matches played by HomeTeam and same with AwayTeam. And then join them based on the team name. I have attached the dataframe: 1 Is there any better way to do it

Mario
  • 1,631
  • 2
  • 21
  • 51
Chinmay
  • 1
  • 1
  • Welcome STF. [Pivot table](https://stackoverflow.com/questions/30244910/how-to-pivot-spark-dataframe), maybe? also possible duplication of this [post](https://stackoverflow.com/q/49671240/10452700). Also, it would be nice when you post the question; please include what you have tried and provide **reproducible examples** and in the end, your expected output. – Mario Jan 29 '23 at 18:08
  • Can you provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) instead of a screenshot? – Mario Feb 01 '23 at 15:29

1 Answers1

0

You can groupby on both teams (and that fits your request: "When it is present in both HomeTeam and Away Team"):

df.groupBy("HomeTeam","AwayTeam").count().show(truncate=False)
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105