2

Working with an NFL dataset with pandas containing all offensive player stats for week 1 of the 2019 season. I currently have three DataFrames, one for passing stats, rushing stats, and receiving stats. I want to combine all three DataFrames into one final DataFrame. The problem is that some players appear in one or more DataFrames. For example, a QB can run and pass the ball, so some QB's appear in both the passing DF and the rushing DF. "Player" is the common index I want to combine them on, but each duplicated row will also have in common the 'Pos' and 'Tm' value. So I want to combine these three DataFrames on the columns 'Player', 'Tm' and 'Pos'.

I currently have each DataFrame saved to a variable in a list named dfs.

I tried

df = dfs[0].join(dfs[1:])

but that results in giving me a DataFrame with one row - Julian Edelman - the only player who ran, passed, and caught the ball in week 1 of the 2019 season. Suffice to say that's not what I'm looking for.

Copied below is the first five rows of each of the DataFrames.

                Pos   Tm PassingYds PassingTD Int PassingAtt Cmp
Player
Lamar Jackson    QB  BAL        324         5   0         20  17
Dak Prescott     QB  DAL        405         4   0         32  25
Robert Griffin   QB  BAL         55         1   0          6   6
Patrick Mahomes  QB  KAN        378         3   0         33  25
Kirk Cousins     QB  MIN         98         1   0         10   8

--------------------------------------------------------------------------
               Pos   Tm Rec Tgt ReceivingYds ReceivingTD
Player
Sammy Watkins   WR  KAN   9  11          198           3
Michael Gallup  WR  DAL   7   7          158           0
John Ross       WR  CIN   7  12          158           2
DeSean Jackson  WR  PHI   8   9          154           2
Marquise Brown  WR  BAL   4   5          147           2
---------------------------------------------------------------------------
                    Pos   Tm RushingAtt RushingYds RushingTD
Player
Marlon Mack          RB  IND         25        174         1
Christian McCaffrey  RB  CAR         19        128         2
Saquon Barkley       RB  NYG         11        120         0
Dalvin Cook          RB  MIN         21        111         2
Mark Ingram          RB  BAL         14        107         2

  • Does this answer your question? [pandas three-way joining multiple dataframes on columns](https://stackoverflow.com/questions/23668427/pandas-three-way-joining-multiple-dataframes-on-columns) – Edeki Okoh Jan 24 '20 at 19:21
  • `dfs = [d.set_index(['Pos', 'Tm'], append=True) for d in dfs]; dfs[0].join(dfs[1:], how='outer')`? – user3483203 Jan 24 '20 at 19:26

2 Answers2

1

You're looking for an outer join with Player, Pos and Tm as an index. First, append these to your index, then call your current attempt with a join type of outer


dfs = [d.set_index(['Pos', 'Tm'], append=True) for d in dfs]
dfs[0].join(dfs[1:], how='outer')

                             PassingYds  PassingTD  Int  PassingAtt   Cmp  Rec   Tgt  ReceivingYds  ReceivingTD  RushingAtt  RushingYds  RushingTD
Player              Pos Tm
Christian McCaffrey RB  CAR         NaN        NaN  NaN         NaN   NaN  NaN   NaN           NaN          NaN        19.0       128.0        2.0
Dak Prescott        QB  DAL       405.0        4.0  0.0        32.0  25.0  NaN   NaN           NaN          NaN         NaN         NaN        NaN
Dalvin Cook         RB  MIN         NaN        NaN  NaN         NaN   NaN  NaN   NaN           NaN          NaN        21.0       111.0        2.0
DeSean Jackson      WR  PHI         NaN        NaN  NaN         NaN   NaN  8.0   9.0         154.0          2.0         NaN         NaN        NaN
John Ross           WR  CIN         NaN        NaN  NaN         NaN   NaN  7.0  12.0         158.0          2.0         NaN         NaN        NaN
Kirk Cousins        QB  MIN        98.0        1.0  0.0        10.0   8.0  NaN   NaN           NaN          NaN         NaN         NaN        NaN
Lamar Jackson       QB  BAL       324.0        5.0  0.0        20.0  17.0  NaN   NaN           NaN          NaN         NaN         NaN        NaN
Mark Ingram         RB  BAL         NaN        NaN  NaN         NaN   NaN  NaN   NaN           NaN          NaN        14.0       107.0        2.0
Marlon Mack         RB  IND         NaN        NaN  NaN         NaN   NaN  NaN   NaN           NaN          NaN        25.0       174.0        1.0
Marquise Brown      WR  BAL         NaN        NaN  NaN         NaN   NaN  4.0   5.0         147.0          2.0         NaN         NaN        NaN
Michael Gallup      WR  DAL         NaN        NaN  NaN         NaN   NaN  7.0   7.0         158.0          0.0         NaN         NaN        NaN
Patrick Mahomes     QB  KAN       378.0        3.0  0.0        33.0  25.0  NaN   NaN           NaN          NaN         NaN         NaN        NaN
Robert Griffin      QB  BAL        55.0        1.0  0.0         6.0   6.0  NaN   NaN           NaN          NaN         NaN         NaN        NaN
Sammy Watkins       WR  KAN         NaN        NaN  NaN         NaN   NaN  9.0  11.0         198.0          3.0         NaN         NaN        NaN
Saquon Barkley      RB  NYG         NaN        NaN  NaN         NaN   NaN  NaN   NaN           NaN          NaN        11.0       120.0        0.0
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • I'm confused as to why you wouldn't want to keep the intersection. If a player had passing stats as well as running stats, you would want the row with that player to reflect all the stats. – user3483203 Jan 24 '20 at 19:39
  • No, au contraire, I want to keep the interesection. That's why I'm trying to make sure I understand the behavior of the `outer join`, making sure it retains the intersection as well and not exclusively the outer sides. – Celius Stingher Jan 24 '20 at 19:43
-1

It's better to convert these following data in .CSV format and then merge the data and later you can import in form of dataframe.

Codewizard_26
  • 68
  • 1
  • 9