0

Currently playing around with some data from an API using pandas, but one area I'm particularly interested in seems to be crammed into one column as a list of dictionaries.

How can I split this column out into four new columns?

Currently:

chip_plays
__________
[{"chip_name":"bboost",
"num_played":114585},
{"chip_name":"freehit",
"num_played":89437},
{"chip_name":"wildcard",
"num_played":263066},
{"chip_name":"3xc",
"num_played":216711}]

[{"chip_name":"bboost",
"num_played":79069},
{"chip_name":"freehit",
"num_played":41897},
{"chip_name":"wildcard",
"num_played":103086},
{"chip_name":"3xc",
"num_played":74713}]

...

Desired outcome:

bboost     freehit     wildcard     3xc
_________________________________________
114585     89437       263066       216711
79069      41897       103086       74713
...

In case it's useful, the api link is: https://fantasy.premierleague.com/api/bootstrap-static/

Screenshot of dataframe: https://share.getcloudapp.com/o0u8yYZn

2 Answers2

0

pivot_table does the job:

df = pd.DataFrame([{"chip_name":"bboost",
"num_played":114585},
{"chip_name":"freehit",
"num_played":89437},
{"chip_name":"wildcard",
"num_played":263066},
{"chip_name":"3xc",
"num_played":216711}])

print(pd.pivot_table(df, values='num_played', columns='chip_name'))
chip_name      3xc  bboost  freehit  wildcard
num_played  216711  114585    89437    263066
Rm4n
  • 623
  • 7
  • 14
0

PLEASE SEE UPDATE

The below example provides the output you are looking for given the response provided in your original question.

It creates the df you want by way of a dictionary comprehension which:

  • Iterates across the list (response)
  • Inserts each "chip_name" value as a key
  • Attributes a list containing the corresponding "num_played" value to each key

The resultant dictionary (data) is then passed as an argument to the pd.DataFrame() to create your desired output.

import pandas as pd

response = [{"chip_name":"bboost",
"num_played":114585},
{"chip_name":"freehit",
"num_played":89437},
{"chip_name":"wildcard",
"num_played":263066},
{"chip_name":"3xc",
"num_played":216711}]

data = { i['chip_name']: [i['num_played']] for i in response} 

df = pd.DataFrame(data)

Will give you the following df.

   bboost  freehit  wildcard     3xc
0  114585    89437    263066  216711

NOTE: The value passed doesn't necessarily have to be within a list, however if it isn't see this question for information about passing scalar values to pd.DataFrame().

UPDATE

Below example works for question following OP update to question. Fairly self explanatory:

  • Defines column names based on response content
  • Creates empty df with column names
  • Iterates across response list
  • Extracts row data using dictionary comprehension as in original answer
  • Create new df (temp) and append to existing to add new rows
import pandas as pd

response = [
    [{"chip_name":"bboost",
    "num_played":114585},
    {"chip_name":"freehit",
    "num_played":89437},
    {"chip_name":"wildcard",
    "num_played":263066},
    {"chip_name":"3xc",
    "num_played":216711}],

    [{"chip_name":"bboost",
    "num_played":79069},
    {"chip_name":"freehit",
    "num_played":41897},
    {"chip_name":"wildcard",
    "num_played":103086},
    {"chip_name":"3xc",
    "num_played":74713}]
]

columns = ['bboost', 'freehit', 'wildcard', '3xc']
df = pd.DataFrame(columns=columns)

for i in response:
    data = { j['chip_name']: [j['num_played']] for j in i }
    temp = pd.DataFrame(data=data)
    df = df.append(temp, ignore_index=True)

df looks like this:

   bboost freehit wildcard     3xc
0  114585   89437   263066  216711
1   79069   41897   103086   74713

Will work fine with variable response length.

NOTE: Seems there may be a typo on part of OP in updated question. My update works under the assumption that the response is a list of lists containing single item dictionaries as this seems to make the most sense based on example given.

JPI93
  • 1,507
  • 5
  • 10