0

My intention is to convert a JSON file into a CSV file. But the particularity of the JSON file is that in turn is composed of several json. As I show here, a json file with two json lines.

{"node": {"id": "1097084148638274164", "__typename": "GraphImage", "edge_media_to_caption": {"edges": [{"node": {"text": "Detalle de la fachada de nuestro Ayuntamiento #Utrera #ayuntamientodeutrera #igersutrera"}}]}, "shortcode": "85oDh4S9J0", "edge_media_to_comment": {"count": 0}, "comments_disabled": false, "taken_at_timestamp": 1445002643, "dimensions": {"height": 608, "width": 1080}, "display_url": "links", "edge_media_preview_like": {"count": 13}, "owner": {"biography": "Cuenta oficial del Ayuntamiento de Utrera #UtreraSeMueve", "blocked_by_viewer": false, "country_block": false, "external_url": "links", "external_url_linkshimmed": "links", "edge_followed_by": {"count": 2819}, "followed_by_viewer": false, "edge_follow": {"count": 1317}, "follows_viewer": false, "full_name": "Ayuntamiento de Utrera", "has_channel": false, "has_blocked_viewer": false, "highlight_reel_count": 5, "has_requested_viewer": false, "id": "1969879851", "is_business_account": true, "is_private": false, "is_verified": false, "edge_mutual_followed_by": {"count": 0, "edges": []}, "profile_pic_url": "links", "profile_pic_url_hd": "links", "requested_by_viewer": false, "username": "instautrera", "connected_fb_page": null, "iphone_struct": {"pk": 1969879851, "is_private": false, "reel_auto_archive": "on", "profile_pic_id": "1756764239516976180_1969879851", "follower_count": 2819, "external_url": "http://www.utrera.org/", "biography": "Cuenta oficial del Ayuntamiento de Utrera #UtreraSeMueve", "usertags_count": 362, "hd_profile_pic_versions": [{"height": 320, "url": "links", "width": 320}, {"height": 640, "url": "links", "width": 640}], "is_verified": false, "has_highlight_reels": true, "external_lynx_url": "links", "username": "instautrera", "profile_pic_url": "links", "following_count": 1317, "media_count": 706, "hd_profile_pic_url_info": {"height": 959, "url": "links", "width": 959}, "has_anonymous_profile_picture": false, "full_name": "Ayuntamiento de Utrera", "is_potential_business": false, "auto_expand_chaining": false, "highlight_reshare_disabled": false}}, "thumbnail_src": "links", "thumbnail_resources": [{"src": "links", "config_width": 150, "config_height": 150}, {"src": "links", "config_width": 240, "config_height": 240}, {"src": "links", "config_width": 320, "config_height": 320}, {"src": "links", "config_width": 480, "config_height": 480}, {"src": "links", "config_width": 640, "config_height": 640}], "is_video": false}, "instaloader": {"version": "4.1", "node_type": "Post"}}
{"node": {"id": "1097144067710243168", "__typename": "GraphImage", "edge_media_to_caption": {"edges": [{"node": {"text": "\u00a1Qu\u00e9 buena estampa! Nuevo c\u00e9sped artificial con el Santuario de fondo.\n#deporte #Utrera #igersutrera #ayuntamientodeutrera"}}]}, "shortcode": "851rd3S9Fg", "edge_media_to_comment": {"count": 0}, "comments_disabled": false, "taken_at_timestamp": 1445009786, "dimensions": {"height": 769, "width": 1080}, "display_url": links", "edge_media_preview_like": {"count": 20}, "owner": {"biography": "Cuenta oficial del Ayuntamiento de Utrera #UtreraSeMueve", "blocked_by_viewer": false, "country_block": false, "external_url": "links", "external_url_linkshimmed": "links", "edge_followed_by": {"count": 2819}, "followed_by_viewer": false, "edge_follow": {"count": 1317}, "follows_viewer": false, "full_name": "Ayuntamiento de Utrera", "has_channel": false, "has_blocked_viewer": false, "highlight_reel_count": 5, "has_requested_viewer": false, "id": "1969879851", "is_business_account": true, "is_private": false, "is_verified": false, "edge_mutual_followed_by": {"count": 0, "edges": []}, "profile_pic_url": "links", "profile_pic_url_hd": "links", "requested_by_viewer": false, "username": "instautrera", "connected_fb_page": null, "iphone_struct": {"pk": 1969879851, "is_private": false, "reel_auto_archive": "on", "profile_pic_id": "1756764239516976180_1969879851", "follower_count": 2819, "external_url": "http://www.utrera.org/", "biography": "Cuenta oficial del Ayuntamiento de Utrera #UtreraSeMueve", "usertags_count": 362, "hd_profile_pic_versions": [{"height": 320, "url": "links", "width": 320}, {"height": 640, "url": "links", "width": 640}], "is_verified": false, "has_highlight_reels": true, "external_lynx_url": "links", "username": "instautrera", "profile_pic_url": "links", "following_count": 1317, "media_count": 706, "hd_profile_pic_url_info": {"height": 959, "url": "links", "width": 959}, "has_anonymous_profile_picture": false, "full_name": "Ayuntamiento de Utrera", "is_potential_business": false, "auto_expand_chaining": false, "highlight_reshare_disabled": false}}, "thumbnail_src": "links", "thumbnail_resources": [{"src": "links", "config_width": 150, "config_height": 150}, {"src": "links", "config_width": 240, "config_height": 240}, {"src": "links", "config_width": 320, "config_height": 320}, {"src": "links", "config_width": 480, "config_height": 480}, {"src": "links", "config_width": 640, "config_height": 640}], "is_video": false}, "instaloader": {"version": "4.1", "node_type": "Post"}}

I intend to convert it to csv, since both share the same header. I have used several codes that I have seen on the internet and none of them serve me. Can you guide me? I'm still new to Python and I'm a little confused about how to do it.

Zoe
  • 27,060
  • 21
  • 118
  • 148
David Perea
  • 139
  • 3
  • 12
  • How should the nested JSONs be treated? As one entry separated by commas or as many separated entries? In the first case use `pandas.read_json()` and `pandas.DataFrame.to_csv()`. In the second case, you have to specify how you want the csv file to look like. – gehbiszumeis Sep 14 '18 at 12:28
  • 1
    Give an example output please :) – amirouche Sep 14 '18 at 12:37
  • "But the particularity of the JSON file is that in turn is composed of several json" => this is named "jsonlines", cf http://jsonlines.org/ and there's a python lib: https://jsonlines.readthedocs.io/en/latest/ – bruno desthuilliers Sep 14 '18 at 14:11
  • 1
    "I have used several codes that I have seen on the internet and none of them serve me" Well that's indeed the problem with "copy/paste programming" xD – bruno desthuilliers Sep 14 '18 at 14:12
  • 1
    If you are trying to convert a nested json to csv this might help. https://stackoverflow.com/a/1872081/10215730 – Sharat Chandra Sep 15 '18 at 07:52
  • @gehbiszumeis What I want is that each component of the josn lines be the header. That is, `id`, `_typemedia`, etc is the header and so the csv file is filled. What I want is to create a CSV with the json data nested. – David Perea Sep 18 '18 at 11:29

2 Answers2

2

Lets say you have a json like

    x=[{
    "pk": 22,
    "model": "auth.permission",
    "fields": {
        "codename": "add_logentry",
        "name": "Can add log entry",
        "content_type": 8
    }
},
{
    "pk": 23,
    "model": "auth.permission",
    "fields": {
        "codename": "change_logentry",
        "name": "Can change log entry",
        "content_type": 8
    },....]

then you can extract the data with a script like

    x = json.loads(x)

    f = csv.writer(open("test.csv", "wb+"))

    # Write CSV Header, If you dont need that, remove this line
    f.writerow(["pk", "model", "codename", "name", "content_type"])

    for x in x:
        f.writerow([x["pk"],
            x["model"],
            x["fields"]["codename"],
            x["fields"]["name"],
            x["fields"]["content_type"]])

to get an output like

   pk,model,codename,name,content_type
   22,auth.permission,add_logentry,Can add log entry,8
   23,auth.permission,change_logentry,Can change log entry,8
Sharat Chandra
  • 191
  • 1
  • 11
  • When running the code, get several errors, that I can solve it. However when it comes to the ForLoop, it gives me the following error: `f.writerow ([x ["node"]["id"], TypeError: string indices must be integers` I can not solve it. Might you help me? – David Perea Sep 18 '18 at 10:58
  • In advance I modified `f = csv.writer(open("test.csv", "wb+"))` for `f = csv.writer(open("test.csv", "w"))` because if I used the first one, binary form, I get the following error: `TypeError: a bytes-like object is required, not 'str'`. While for the last one I do not get error – David Perea Sep 18 '18 at 11:07
  • 1
    Try reading the json file like this `x = json.loads(open('studentjson.json').read())` `f = csv.writer(open("test.csv", "w+"))` – Sharat Chandra Sep 18 '18 at 13:23
1

After reading the JSON you can flatten it using pandas.

import pandas as pd
from pandas.io.json import json_normalize

data = json.load(YOUR DATA) #Assuming you are loading the json from a file
norm_data = pd.DataFrame(json_normalize(data))

Then convert it into a csv file

norm_data.to_csv('filename.csv')
Srikar
  • 71
  • 1
  • 5