complex json file to csv in python

Question

I need to convert a complex json file to csv using python, I tried a lot of codes without success, I came here for help,I updated the question, the JSON file is about a million,I need to convert them to csv format

csv file

{
    "_id": {
        "$oid": "2e3230"
    },
    "add": {
        "address1": {
            "address": "kvartira 14",
            "zipcode": "10005",
        },
        "name": "Evgiya Kovava",
        "address2": {
            "country": "US",
            "country_name": "NY",
        }
    }
}
{
    "_id": {
        "$oid": "2d118c8bo"
    },
    "add": {
        "address1": {
            "address": "kvartira 14",
            "zipcode": "52805",
        },
        "name": "Eiya tceva",
        "address2": {
            "country": "US",
            "country_name": "TX",
        }
    }
}

How is the csv supposed to look like? If we suppose you need a dataframe structure, what are the columns, what the rows? — Rockbar, Jul 17 '22 at 13:44
`cvs` keeps all as simple `rows` and `columns` but you have nested elements - you have to show what result you expect. Should it flatten it or it should keep as nested elements (dictionary inside cell)? — furas, Jul 17 '22 at 14:59
if you tried some methods then you should show it in question (not in comments) - this way we could see what wrong you did, and we could suggest how to fix it. — furas, Jul 17 '22 at 15:00
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Jul 17 '22 at 17:39

diml · Accepted Answer · 2022-07-17T21:12:38.213

import pandas as pd

null = 'null'

data = {
    "_id": {
        "$oid": "2e3230s314i5dc07e118c8bo"
    },
    "add": {
        "address": {
            "address_type": "Door",
            "address": "kvartira 14",
            "city": "new york",
            "region": null,
            "zipcode": "10005",
        },
        "name": "Evgeniya Kovantceva",
        "type": "Private person",
        "code": null,
        "additional_phone_nums": null,
        "email": null,
        "notifications": [],
        "address": {
            "address": "kvartira 14",
            "city": "new york",
            "region": null,
            "zipcode": "10005",
            "country": "US",
            "country_name": "NY",
        }
    }
}

df = pd.json_normalize(data)
df.to_csv('yourpath.csv')

Beware the null value. The "address" nested dictionary comes inside "add" two times almost identical?

EDIT

Ok after your information it looks like json.JSONDecoder() is what you need.

Originally posted by @pschill on this link: how to analyze json objects that are NOT separated by comma (preferably in Python)

I tried his code on your data:

import json 
import pandas as pd

data = """{
    "_id": {
        "$oid": "2e3230"
    },
    "add": {
        "address1": {
            "address": "kvartira 14",
            "zipcode": "10005"
        },
        "name": "Evgiya Kovava",
        "address2": {
            "country": "US",
            "country_name": "NY"
        }
    }
}
{
    "_id": {
        "$oid": "2d118c8bo"
    },
    "add": {
        "address1": {
            "address": "kvartira 14",
            "zipcode": "52805"
        },
        "name": "Eiya tceva",
        "address2": {
            "country": "US",
            "country_name": "TX"
        }
    }
}"""

Keep in mind that your data also has trailing commas which makes the data unreadable (the last commas right before every closing bracket).

You have to remove them with some regex or another approach I am not familiar with. For the purpose of this answer I removed them manually.

So after that I tried this:

content = data
parsed_values = []
decoder = json.JSONDecoder()
while content:
    value, new_start = decoder.raw_decode(content)
    content = content[new_start:].strip()
    # You can handle the value directly in this loop:
    # print("Parsed:", value)
    # Or you can store it in a container and use it later:
    parsed_values.append(value)

which gave me an error but the list seems to get populated with all the values:

parsed_values
[{'_id': {'$oid': '2e3230'},
  'add': {'address1': {'address': 'kvartira 14', 'zipcode': '10005'},
   'name': 'Evgiya Kovava',
   'address2': {'country': 'US', 'country_name': 'NY'}}},
 {'_id': {'$oid': '2d118c8bo'},
  'add': {'address1': {'address': 'kvartira 14', 'zipcode': '52805'},
   'name': 'Eiya tceva',
   'address2': {'country': 'US', 'country_name': 'TX'}}}]

next I did:

df = pd.json_normalize(parsed_values)

which worked fine. You can always save that to a csv with:

df.to_csv('yourpath.csv')

Tell me if that helped.

Your json is quite problematic after all. Duplicate keys (problem), null value (unreadable), trailing commas (unreadable), not comma separated dicts... It didn't catch the eye at first :P

However, it errors when doing a lot of conversions, how can I load the JSON file as well as flatten it — lex9527, Jul 17 '22 at 18:13
My answer was for the specific json. What do u mean a lot of conversions? Other jsons? With other structure-data? What errors do u get? You understand that my answer was based only on the data given. You could provide more information I guess — diml, Jul 17 '22 at 18:19
I mean how to open json file using "open" or "read_json", my json file has about a million entries — lex9527, Jul 17 '22 at 18:48
You provided two dictionaries (somehow different from the original post-they dont have the duplicate address key etc) but something is missing again. Are they inside a list? [ {dict1}, {dict2} ] inside a string? ' {dict1}, {dict2} ' or inside a dictionary? { {dict1}, {dict2} } — diml, Jul 17 '22 at 19:18
form：{dict1}{dict2}{dict3}{dict4} The json sample is the same as the first time,but,The json file was too long to post, so I removed some unnecessary information, — lex9527, Jul 17 '22 at 20:06
If for instance these two objects where inside a list separated by a comma [ {obj1}, {obj2} ] , pd.json_normalize() would suite you well. What is the first and the last character of your json and what seperates the objects? I still feel you don't give all the data needed...alse keep in mind that in the original post there is a duplicate key (address) which gets overriden by the last so you eventually lose "address_type" information unless you think of something else. — diml, Jul 17 '22 at 20:08

complex json file to csv in python

1 Answers1

Linked