-2

I have a pandas Dataframe that has been grouped into a multi-index, and it basically groups the stats by game id, having the level 2 index be the team's id. I am using this data for machine learning and need the stats to be on the same row, where the column name will be joined by either team1 or team1, as in team1.pts and team2.pts. Is this possible in any convenient way?

Thank you so much for the help, I attached a screenshot of the dataframe in case I did a horrible job of explaining this.

Table Example

EDIT:

For a better way to explain this, every consecutive row should be joined with the row above it, to sort of flatten the dataframe to half the amount of rows and double the amount of columns.

  • no images please... read this: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples Please provide 3 things: 1) copy and paste-able input 2) what you have tried 3) Expected Output. Downvoting for now. – David Erickson Jul 10 '20 at 23:19
  • Please [create a reproducible copy of the DataFrame with `df.head(10).to_clipboard(sep=',')`](https://stackoverflow.com/questions/52413246/how-to-provide-a-copy-of-your-dataframe-with-to-clipboard), [edit] the question, and paste the clipboard into a code block. – Trenton McKinney Jul 11 '20 at 01:51

1 Answers1

0
  1. always provide data - going through OCR is painful to provide help
  2. reset_index() so it's just row numbers
  3. calc gameset you note you want to join consecutive games. This does assume they come in pairs. Will need a better calc if they don't
  4. join 1st and 2nd (last) rows in gameset. column names are suffixes not prefixes
js = {"game.id":[15,15,16,16,17,17,18,18],"team.id":[16,30,14,25,4,22,3,30],"id":[2892,4836,5005,4771,5538,5707,7397,7228],"ast":[6,19,23,21,32,18,22,14],"blk":[1,15,6,10,4,7,6,1],"dreb":[14,33,38,40,43,32,43,24]}
df = pd.DataFrame(js).set_index(["game.id","team.id"])
dfx = df.reset_index().assign(gameset=lambda r: r.index//2) 
dfx.groupby(["game.id","gameset"]).first().join(dfx.groupby(["game.id","gameset"]).last(), lsuffix=".1st", rsuffix=".2nd")

output

        team.id.1st id.1st  ast.1st blk.1st dreb.1st    team.id.2nd id.2nd  ast.2nd blk.2nd dreb.2nd
game.id gameset                                     
15  0   16  2892    6   1   14  30  4836    19  15  33
16  1   14  5005    23  6   38  25  4771    21  10  40
17  2   4   5538    32  4   43  22  5707    18  7   32
18  3   3   7397    22  6   43  30  7228    14  1   24

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30