Json to csv in normalized form

Question

Json Format:

[{"studios": [{"studioId": 539}, {"studioId": 540}], 
"id": 843, 
"title": "In the Mood for Love",
"crew": [{"personId": 12453, "department": "Directing", "job": "Director"}, {"personId": 12454, "department": "Sound", "job": "Music"}, {"personId": 12455, "department": "Sound", "job": "Original Music Composer"}, {"personId": 1357, "department": "Camera", "job": "Director of Photography"}, {"personId": 12453, "department": "Writing", "job": "Screenplay"}, {"personId": 12453, "department": "Production", "job": "Producer"}, {"personId": 21909, "department": "Production", "job": "Executive Producer"}, {"personId": 45818, "department": "Editing", "job": "Editor"}, {"personId": 232804, "department": "Camera", "job": "Director of Photography"}, {"personId": 12667, "department": "Camera", "job": "Director of Photography"}],
"releases": [{"releasedate": "2000-11-08", "country": "FR"}, {"releasedate": "2000-05-22", "country": "US"}]
"genres": ["Drama", "Romance"], 
"releasedate": "2000-05-22", 
"cast": [{"personId": 1337, "character": "Chow Mo-Wan", "order": 0}, {"personId": 1338, "character": "Su Li-Zhen", "order": 1}, {"personId": 12674, "character": "Ah Ping", "order": 2}, {"personId": 12462, "character": "Mrs. Suen", "order": 4}, {"personId": 12463, "character": "Mr. Ho", "order": 5}, {"personId": 12464, "character": "", "order": 6}, {"personId": 12465, "character": "", "order": 7}, {"personId": 12466, "character": "Mr. Chan", "order": 8}, {"personId": 12467, "character": "The Amah", "order": 9}, {"personId": 12468, "character": "", "order": 10}, {"personId": 12469, "character": "", "order": 11}, {"personId": 12470, "character": "Mrs. Chow", "order": 12}], 
"userrating": 7.6}]

I am trying to convert it to .csv file. But I am getting errors. I want the csv file to be normalized in 1NF so that I can directly transfer to some database

MyCode:

import json
import csv
with open("data3.json") as json_file, open("data3.csv", "w",encoding='utf-8') as csv_file:
    csv_file = csv.writer(csv_file)
    a = json.load(json_file)
    csv_file.writerow(["StudiosId", "Id", "Title","personId","Department","Job","ReleaseDate","PosterLink","Genres","Cast","Runtime"])
    for item in a:
        csv_file.writerow([item["studios"], item["id"], item["title"],item["crew"][0],item["crew"][1],item["crew"][2],item["poster"],item["genres"],item["releasedate"],item["cast"],item["runtime"]])

Error:

Traceback (most recent call last):
File "<stdin>", line 7, in <module>
IndexError: list index out of range

The problem is that you're using Python 2 which doesn't support the `encoding` parameter (it's new in Python 3). Either install Python 3 or take a look at http://stackoverflow.com/questions/10971033/backporting-python-3-openencoding-utf-8-to-python-2. — maxymoo, Oct 26 '15 at 02:14
I tried with Python 3.5.0 too. It gives the following error: Traceback (most recent call last): File "", line 7, in TypeError: list indices must be integers or slices, not str — Blank, Oct 26 '15 at 02:25

ajsp · Answer 1 · 2015-10-26T05:45:43.507

You're JSON structure is incorrect (missing comma), the code looks ok. Try this instead:

[{"studios": [{"studioId": 539}, {"studioId": 540}], 
"id": 843, 
"title": "In the Mood for Love",
"crew": [{"personId": 12453, "department": "Directing", "job": "Director"}, {"personId": 12454, "department": "Sound", "job": "Music"}, {"personId": 12455, "department": "Sound", "job": "Original Music Composer"}, {"personId": 1357, "department": "Camera", "job": "Director of Photography"}, {"personId": 12453, "department": "Writing", "job": "Screenplay"}, {"personId": 12453, "department": "Production", "job": "Producer"}, {"personId": 21909, "department": "Production", "job": "Executive Producer"}, {"personId": 45818, "department": "Editing", "job": "Editor"}, {"personId": 232804, "department": "Camera", "job": "Director of Photography"}, {"personId": 12667, "department": "Camera", "job": "Director of Photography"}],
"releases": [{"releasedate": "2000-11-08", "country": "FR"}, {"releasedate": "2000-05-22", "country": "US"}],
"genres": ["Drama", "Romance"], 
"releasedate": "2000-05-22", 
"cast": [{"personId": 1337, "character": "Chow Mo-Wan", "order": 0}, {"personId": 1338, "character": "Su Li-Zhen", "order": 1}, {"personId": 12674, "character": "Ah Ping", "order": 2}, {"personId": 12462, "character": "Mrs. Suen", "order": 4}, {"personId": 12463, "character": "Mr. Ho", "order": 5}, {"personId": 12464, "character": "", "order": 6}, {"personId": 12465, "character": "", "order": 7}, {"personId": 12466, "character": "Mr. Chan", "order": 8}, {"personId": 12467, "character": "The Amah", "order": 9}, {"personId": 12468, "character": "", "order": 10}, {"personId": 12469, "character": "", "order": 11}, {"personId": 12470, "character": "Mrs. Chow", "order": 12}], 
"userrating": 7.6}]

There is also no poster or runtime keys, in the sample you provided.

EDIT update.

import csv
import json

with open('data.json', 'r') as f, open("data3.csv", "w") as csv_file:
    csv_file = csv.writer(csv_file)
    a = json.load(f)
    csv_file.writerow(["StudiosId", "Id", "Title","personId","Department","Job","ReleaseDate","PosterLink","Genres","Cast","Runtime"])
    for item in a:
        csv_file.writerow([item["studios"], item["id"], item["title"],item["crew"][0]['personId'],item["crew"][0]['department'],item["crew"][0]['job'],item["genres"],item["releasedate"],item["cast"]])

It is not able to separate "crew" and "cast" as they have mutiple values. — Blank, Oct 26 '15 at 04:01
@Blank updated, this works fine on my end, but if not post a more complete example. — ajsp, Oct 26 '15 at 04:22
I want the crew data in 3 different table, First being the personId,second being the Job, and lastly the Department. Now it is just separating the director, music ,writer etc. — Blank, Oct 26 '15 at 05:10
@Blank Then you will need to rethink the method you are using, and that has nothing to do with the question asked. Why don't you go straight to database? It would save you an awful lot of trouble. — ajsp, Oct 26 '15 at 05:20
I was using this approach as I felt importing csv file into any database will be lot simpler. — Blank, Oct 26 '15 at 05:22
@Blank ok there you go, you'll still have to update the missing keys. — ajsp, Oct 26 '15 at 05:47

Json to csv in normalized form

1 Answers1