1

I am utilizing the API here: https://github.com/CFBD/cfbd-python/blob/master/docs/GamesApi.md#get_games that returns a list of dictionaries and want to get the data into a format that I can manipulate or store it into a database. I have attempted to convert it to a pandas dataframe with the pd.DataFrame() method as outlined in this question: Convert list of dictionaries to a pandas DataFrame. I first have stored the API response as a variable api_response = api_instance.get_games(), then converted it to a DataFrame with df = pd.DataFrame(api_response). Printing that DataFrame returns only one column containing the entire dictionary for each game instance instead of splitting out columns by key and populating with values.

And example of how the data is returned for two games from print(api_response) is in the following format:

[{'attendance': None,
 'away_conference': 'FBS Independents',
 'away_id': 87,
 'away_line_scores': [7, 10, 21, 0, 3],
 'away_points': 41,
 'away_post_win_prob': 0.44707054087049625,
 'away_team': 'Notre Dame',
 'conference_game': True,
 'excitement_index': 7.4132284343,
 'highlights': None,
 'home_conference': 'ACC',
 'home_id': 52,
 'home_line_scores': [7, 7, 6, 18, 0],
 'home_points': 38,
 'home_post_win_prob': 0.5529294591295038,
 'home_team': 'Florida State',
 'id': 401282614,
 'neutral_site': False,
 'notes': None,
 'season': 2021,
 'season_type': 'regular',
 'start_date': '2021-09-05T23:30:00.000Z',
 'start_time_tbd': False,
 'venue': 'Bobby Bowden Field at Doak Campbell Stadium',
 'venue_id': 3697,
 'week': 1}, {'attendance': None,
 'away_conference': 'ACC',
 'away_id': 97,
 'away_line_scores': [0, 0, 10, 14],
 'away_points': 24,
 'away_post_win_prob': 0.04096564974450303,
 'away_team': 'Louisville',
 'conference_game': False,
 'excitement_index': 4.6236823229,
 'highlights': None,
 'home_conference': 'SEC',
 'home_id': 145,
 'home_line_scores': [9, 17, 3, 14],
 'home_points': 43,
 'home_post_win_prob': 0.959034350255497,
 'home_team': 'Ole Miss',
 'id': 401282055,
 'neutral_site': True,
 'notes': None,
 'season': 2021,
 'season_type': 'regular',
 'start_date': '2021-09-07T00:00:00.000Z',
 'start_time_tbd': False,
 'venue': 'Mercedes-Benz Stadium',
 'venue_id': 5348,
 'week': 1}]

Is there a better way to store this that I am overlooking?

Tom Mallinson
  • 13
  • 1
  • 5
  • `pd.DataFrame(your_data)` returns a dataframe with 2 rows and 26 columns. – Corralien Sep 14 '21 at 20:40
  • I am storing the API response as a variable ```api_response = api_instance.get_games(year=year, week=week, id=id)``` where I specify year, week, and game id parameters. Whenever I try to convert it to a DataFrame ```df = pd.DataFrame(api_response)``` and print the result, it returns a DataFrame with one row and one column containing the entire contents of a dictionary. I used two games as the example in my original question, just to note. – Tom Mallinson Sep 14 '21 at 20:45
  • Updated my original question to explain this a bit better. – Tom Mallinson Sep 14 '21 at 20:48
  • Did you try `pd.DataFrame.from_dict(your_data, orient='index')`? – Corralien Sep 14 '21 at 20:50
  • Does `api_response` actually contain the parsed, nested data structure? Or does it contain a *single string* with JSON data? – Karl Knechtel Sep 14 '21 at 20:52
  • @KarlKnechtel. From [the documentation](https://github.com/CFBD/cfbd-python/blob/master/docs/GamesApi.md#get_games), the API call return a list of `Game`. – Corralien Sep 14 '21 at 20:57
  • I did try the ```from_dict``` function but it returns "AttributeError: 'list' object has no attribute 'values'" @Corralien – Tom Mallinson Sep 14 '21 at 20:57
  • Can you try: `type(my_list[0])` – Corralien Sep 14 '21 at 21:04
  • Printing that returns `````` Printing just the type of the variable ```api_response``` returns `````` – Tom Mallinson Sep 14 '21 at 21:06

4 Answers4

2

According to the source code, Game object have a method to_dict:

Try:

df = pd.DataFrame([game.to_dict() for game in api_response])

Note:

This problem has been discussed previously, see for example here. In order for pandas to convert this to a DataFrame we need to make sure that we're actually dealing with a list of dictionaries.

Usually we might print our api_response and look at the data. In this case, this is not enough. Because even though api_response looks (read more about __repr__ here) like a list of dictionaries, it's actually a list of Game objects.

We can learn this by printing the type of the first element in our list:

>>> print(type(api_response[0]))
<class 'cfbd.models.game.Game'>

Some classes will have a to_dict method attached to them. If they don't, you can use vars instead:

df = pd.DataFrame([vars(game) for game in api_response])
user3471881
  • 2,614
  • 3
  • 18
  • 34
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Maybe explain why Game might look like a dict when printing but really isn't. – user3471881 Sep 14 '21 at 21:38
  • @user3471881. This is the representation `__repr__` of the object. Check [this code](https://github.com/CFBD/cfbd-python/blob/f1c210701bda0f279edf52fb38b4620678a8f132/cfbd/models/game.py#L753-L759) and you will understand. `__repr__` calls `to_str` which call `to_dict`. – Corralien Sep 14 '21 at 21:42
  • @user3471881 You can see it here: https://github.com/CFBD/cfbd-python/blob/f1c210701bda0f279edf52fb38b4620678a8f132/cfbd/models/game.py#L757 – emremrah Sep 14 '21 at 21:42
  • @Corralien you're always a step ahead :) – emremrah Sep 14 '21 at 21:43
  • My point was that you can add this to the answer so that it might help someone who has a similar problem with lists of objects that look like dictionaries but are in fact...not :) – user3471881 Sep 14 '21 at 21:45
  • @user3471881. try: `repr([{'a': 0}, {'b': 1}])`. It looks like a list of dicts? but it's a string. This is the same case with Python base types. – Corralien Sep 14 '21 at 21:49
  • I know. And a lot of pandas users on this forum are *not* familiar with Python basics. – user3471881 Sep 14 '21 at 21:51
  • If we don't expand in the answer to give other users (that don't use this specific API) some guidance - I think this answer belongs in a comment and the question should probably be deleted. (imo) – user3471881 Sep 14 '21 at 21:55
  • 1
    @user3471881. I updated my answer. Feel free to amend it. – Corralien Sep 14 '21 at 22:02
  • Very strange design choice IMO to have a class that doesn't subclass `dict`, but looks exactly like one in its `__repr__`. I *guess* the intent is to make it easier to serialize back to JSON, but I don't think that actually works. I would file an issue about it, honestly. – Karl Knechtel Sep 14 '21 at 23:54
  • "This class is auto generated by the swagger code generator program." Found this too: https://stackoverflow.com/questions/63100487/how-to-parse-a-list-and-a-dictionary-openapi-and-swagger-in-python. – user3471881 Sep 15 '21 at 06:03
0

You can use something like this

df = DataFrame.from_dict('col_1'=api_response.keys(), 'col_2'=api_response.values())
0

enter image description hereMay be you could try checking type of the data received from API,

type(api_response)

Below code gives exact output that you're looking for which returns a dataframe with 2 rows and 26 columns

import pandas as pd
d1=[{'attendance': None,'away_conference': 'FBS Independents','away_id': 87,'away_line_scores': [7, 10, 21, 0, 3],'away_points': 41,'away_post_win_prob': 0.44707054087049625,'away_team': 'Notre Dame','conference_game': True,'excitement_index': 7.4132284343,'highlights': None,'home_conference': 'ACC','home_id': 52,'home_line_scores': [7, 7, 6, 18, 0],'home_points': 38,'home_post_win_prob': 0.5529294591295038,'home_team': 'Florida State','id': 401282614,'neutral_site': False,'notes': None,'season': 2021,'season_type': 'regular','start_date': '2021-09-05T23:30:00.000Z','start_time_tbd': False,'venue': 'Bobby Bowden Field at Doak Campbell Stadium','venue_id': 3697,'week': 1}, {'attendance': None,'away_conference': 'ACC','away_id': 97,'away_line_scores': [0, 0, 10, 14],'away_points': 24,'away_post_win_prob': 0.04096564974450303,'away_team': 'Louisville','conference_game': False,'excitement_index': 4.6236823229,'highlights': None,'home_conference': 'SEC','home_id': 145,'home_line_scores': [9, 17, 3, 14],'home_points': 43,'home_post_win_prob': 0.959034350255497,'home_team': 'Ole Miss','id': 401282055,'neutral_site': True,'notes': None,'season': 2021,'season_type': 'regular','start_date': '2021-09-07T00:00:00.000Z','start_time_tbd': False,'venue': 'Mercedes-Benz Stadium','venue_id': 5348,'week': 1}]
df = pd.DataFrame(data=d1)
display(df)
Mohan
  • 107
  • 2
0

From the source codes, GamesApi.get_games returns a list of Games. And from there you can see that it has a to_dict method, which allows you create a dataframe from list of dictionaries.

emremrah
  • 1,733
  • 13
  • 19