-1

I have a pandas series that looks like this:

GameID PlayID FrameID dictionary
1 2 1 {34: 63, 23: 73, 42: 96}
2 {34: 63, 23: 73, 42: 94}
1 5 1 {62: 63, 25: 73, 72: 94}
2 {34: 63, 23: 73, 42: 94}
1 2 1 {10: 11, 2: 94, 3: 35}
2 {52: 11, 91: 34, 12: 35}

And want to create a dataframe that would look like this:

GameID PlayID FrameID playerId opposingId
1 2 1 34 63
1 2 1 23 73
1 2 1 42 96
1 5 1 62 63
1 5 1 25 73
1 5 1 72 94

Is there an easy way to do this?

I've tried converting the list of dictionaries into their own data frame, but because there are so many different keys, it wasn't working as expected. I know with a list, you can use .explode but that doesn't work with dictionaries.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • 1
    Welcome to Stack Overflow! Check out the [tour] and [How to ask a good question](/help/how-to-ask), for tips like how to write a good title. These dictionaries aren't multi-level, so the title doesn't really mean anything currently. You can [edit] it. – wjandrea Jan 04 '23 at 03:10
  • Where did the rest of the data go in the output? FrameID=2 is missing, and so is the second occurrence of GameID=1,PlayID=2. – wjandrea Jan 04 '23 at 03:10
  • Where the GameID,PlayID columns are empty, does that mean they're the empty string, or just not shown cause they're part of a MultiIndex? Please make a [reproducible pandas example](/q/20109391/4518341). – wjandrea Jan 04 '23 at 03:12
  • Just so you know, I think I found a solution, but I'm waiting for clarification. – wjandrea Jan 04 '23 at 03:38
  • The rest of the data output is missing just because it took too long to type. The empty rows are simply not shown because of multi index. sorry for the confusion! – Ben Wolbransky Jan 04 '23 at 14:04
  • Please [edit] to clarify – wjandrea Jan 04 '23 at 19:00

1 Answers1

0

I hope I understood your question correct.

Step 1: Create new dataframe by nested loop over the old df, I used ffill

df= pd.DataFrame([(G, P, F, k, v) for (G, P, F,  dic) in df.ffill().values for k, v in dic.items()])

Step 2: Rename columns

df.columns = ['GameID', 'PlayID', 'FrameID', 'PlayerID', 'opposingId']

Step 3: remove duplicates by rows by dict keys, only keep first

df.drop_duplicates(subset=['PlayerID'], keep='first').reset_index(drop=True)

Output:

    GameID  PlayID  FraneID PlayerID    opposingId
0      1.0      2.0     1        34      63
1      1.0      2.0     1        23      73
2      1.0      2.0     1        42      96
3      1.0      5.0     1        62      63
4      1.0      5.0     1        25      73
5      1.0      5.0     1        72      94
6      1.0      2.0     1        10      11
7      1.0      2.0     1         2      94
8      1.0      2.0     1         3      35
9      1.0      2.0     2        52      11
10     1.0      2.0     2        91      34
11     1.0      2.0     2        12      35
R. Baraiya
  • 1,490
  • 1
  • 4
  • 17