1

I have a dataset that consists of all the shots taken in a large number of football competitions for a number of seasons. I wrote the following script to make subsets for each competition and corresponding season.

import pandas as pd
import csv
shots = pd.read_csv("C:/Users/HJA/Desktop/Betting/understatV0.01/shots.csv", encoding='iso-8859-1')

shots_premier_league = shots.groupby(['Competition']).get_group('Premier_League')
shots_bundesliga = shots.groupby(['Competition']).get_group('Bundesliga')
shots_la_liga = shots.groupby(['Competition']).get_group('La_Liga')
shots_ligue_1 = shots.groupby(['Competition']).get_group('Ligue_1')
shots_serie_a = shots.groupby(['Competition']).get_group('Serie_A')

Everything goes fine until this point. However, now I want to subdivide each competition in samples for each season. I use the following script (in this case I use as example the Premier League:

shots_premier_league_2014 = shots_premier_league.groupby(['Season']).get_group('2014')
shots_premier_league_2015 = shots_premier_league.groupby(['Season']).get_group('2015')
shots_premier_league_2016 = shots_premier_league.groupby(['Season']).get_group('2016')
shots_premier_league_2017 = shots_premier_league.groupby(['Season']).get_group('2017')
shots_premier_league_2018 = shots_premier_league.groupby(['Season']).get_group('2018')

This result in the following error: enter image description here

I am 100% sure that 2014 is an actual value. In addition, how can I write a function that automatically includes the competition and season in the name of the pandas dataframe?

HJA24
  • 410
  • 2
  • 11
  • 33

1 Answers1

2

I think problem is 2014 is integer, so need remove '':

.get_group(2014)

But better here is create dictionary of DataFrames like, because globals are not recommended:

dfs = dict(tuple(shots_premier_league.groupby(['Season'])))

And then select each DataFrame by key like:

print (dfs[2014])
print (dfs[2015])

How can I write a function that automatically includes the competition and season in the name of the pandas dataframe?

dfs = dict(tuple(shots_premier_league.groupby(['Competition','Season'])))
print (dfs[('Bundesliga', 2014)])

If want select by strings:

d = dict(tuple(df.groupby(['Competition','Season'])))
#python 3.6+ solution with f-strings
dfs = {f'{k1}_{k2}' :v for (k1, k2), v in d.items()}
#python bellow
#dfs = {'{}_{}'.format(k1, k2) :v for (k1, k2), v in d.items()}
print (dfs['Bundesliga_2014'])

And if want see all keys for your data:

print (dfs.keys())
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252