2

I am trying to store the following into a pandas dataframe:

{    
"page": 1,
      "results": [
        {
          "poster_path": null,
          "adult": false,
          "overview": "Go behind the scenes during One Directions sell out \"Take Me Home\" tour and experience life on the road.",
          "release_date": "2013-08-30",
          "genre_ids": [
            99,
            10402
          ],
          "id": 164558,
          "original_title": "One Direction: This Is Us",
          "original_language": "en",
          "title": "One Direction: This Is Us",
          "backdrop_path": null,
          "popularity": 1.166982,
          "vote_count": 55,
          "video": false,
          "vote_average": 8.45
        },
        {
          "poster_path": null,
          "adult": false,
          "overview": "",
          "release_date": "1954-06-22",
          "genre_ids": [
            80,
            18
          ],
          "id": 654,
          "original_title": "On the Waterfront",
          "original_language": "en",
          "title": "On the Waterfront",
          "backdrop_path": null,
          "popularity": 1.07031,
          "vote_count": 51,
          "video": false,
          "vote_average": 8.19
         }
           etc....
           etc.....
      ],
      "total_results": 61,
      "total_pages": 4
    }

What is the simplest way to store all attributes of each result in a pandas dataframe?

I am storing the json objects in a dict variable.Do i realy need to iterate through the result block stored in my dict variable and define each field for each pandas column?

This what i am trying to avoid:

columns = ['filmid', 'title'....... ]

# create dataframe 
df = pandas.DataFrame(columns=columns)


for film in films:

    df.loc[len(df)]=[film['id'],title['title']........................] 
Bonzay
  • 740
  • 2
  • 10
  • 29

1 Answers1

3

You can possibly use the json_normalize function. Here is an example:

import json
from pandas.io.json import json_normalize

#Load data
with open('yourfile.json') as file:
        data = json.load(file)

flat_json_df = json_normalize(data['results'])

For me, when used on the data above, it results in the following dataframe:

adult   backdrop_path   genre_ids   id  original_language   original_title  overview    popularity  poster_path release_date    title   video   vote_average    vote_count
0   False   None    [99, 10402] 164558  en  One Direction: This Is Us   Go behind the scenes during One Directions sel...   1.166982    None    2013-08-30  One Direction: This Is Us   False   8.45    55
1   False   None    [80, 18]    654 en  On the Waterfront       1.070310    None    1954-06-22  On the Waterfront   False   8.19    51
Koralp Catalsakal
  • 1,114
  • 8
  • 11