3

I have a pandas DataFrame that looks similar to this:

          player     frameID    x          y  
  
0         Tom        0          1          3
1         Tom        1          2          3
2         Tom        2          1          3
3         John       0          4          2
4         John       1          3          1
5         John       2          2          2
6         Greg       0          5          3
7         Greg       1          3          2
8         Greg       2          2          1
.         .          .          .          .
.         .          .          .          .
.         .          .          .          .

And I want to format it so that it looks like this:

          player  Tom           John          Greg
frameID    
                  x      y      x      y      x      y

0                 1      3      4      2      5      3
1                 2      3      3      1      3      2
2                 1      3      2      2      2      1
.                 .      .      .      .      .      .
.                 .      .      .      .      .      .
.                 .      .      .      .      .      .

However, I have no clue how to go about the multi-indexing. As you can see, I want to take two of the columns and place one as an index on the columns and one as an index on the rows. Any help would be greatly appreciated.

Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • as an aside: why do you want a multiindex? I've found almost everything can be done with column values instead (unless you need better performance) – anon01 Nov 21 '20 at 07:46
  • The data is sports movement data, and each line is a frame of a sports play. I wanted it in this format because I need all the players' positions at each frame for use in a clustering algorithm (i.e. in this format each row now has exactly the data I need). – Evan Pfeifer Nov 21 '20 at 09:53

1 Answers1

3

Let's create a multilevel index then use stack + unstack to reshape the dataframe:

df.set_index(['frameID', 'player']).stack().unstack([1, 2])

player    Tom     John    Greg   
          x  y    x  y    x  y
frameID                       
0         1  3    4  2    5  3
1         2  3    3  1    3  2
2         1  3    2  2    2  1
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • nice! When would one use a multilevel (column) index? – anon01 Nov 21 '20 at 07:57
  • Thanks @anon01 From the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html) MultiIndex is generally used when data has logically related structure as it allow you to do grouping, selection, and reshaping operations in a more concise way. I suggest you to check the documentation you can also refer to [this answer](https://stackoverflow.com/questions/13226029/benefits-of-pandas-multiindex) which nicely explains when to use multiindex. – Shubham Sharma Nov 21 '20 at 08:08
  • 1
    Man you sure are a lifesaver. Thank you so much. – Evan Pfeifer Nov 21 '20 at 09:48