0

The json I'm trying to parse has a complex structure. Within the full list, there are 311 dictionaries, from which I managed to get some key values. However, within each of those 311 dictionaries, there is a key 'cadCargosFuncoes', whose value is a dictionary with one key, whose value is itself a list of dictionaries.

What I need is, for each of those 311 dictionaries of the json, grab the key value 'funDes' within each of its n dictionaries, where n is not always the same, nor guaranteed to be different than zero.

I've tried to introduce a loop:

for i in ['cadCargosFuncoes']['pt_ar_wsgode_objectos_DadosCargosFuncoes']`

in some different ways, but was never able to achieve what I need.

import requests
import json
import csv
from csv import DictWriter

r = requests.get('http://app.parlamento.pt/webutils/docs/doc.txt?path=6148523063446f764c324679626d56304c3239775a57356b595852684c3052685a47397a51574a6c636e5276637939535a576470633352764a544977516d6c765a334c446f575a705932387657456c4a535355794d45786c5a326c7a6247463064584a684c314a6c5a326c7a644739436157396e636d466d61574e7657456c4a53563971633239754c6e523464413d3d&fich=RegistoBiograficoXIII_json.txt&Inline=true')
r.raise_for_status()
data = r.json()

mps = data['RegistoBiografico']['RegistoBiograficoList']['pt_ar_wsgode_objectos_DadosRegistoBiograficoWeb']

for j in mps:
    if 'cadProfissao' in j:
        pass
    else:
        j['cadProfissao'] = '-88'

for j in mps:
    if 'cadDtNascimento' in j:
        pass
    else:
        j['cadDtNascimento'] = '-88'

result = [{"ID": j["cadId"], "Nome": j["cadNomeCompleto"], "Sexo": j["cadSexo"], "D-Nasc": j["cadDtNascimento"], "Profissao": j["cadProfissao"]} for j in mps]

with open('bio_xiii.csv', 'w') as outfile:
    writer = DictWriter(outfile, ('ID', 'Nome', 'Sexo', 'D-Nasc', 'Profissao'))
    writer.writerows(result)

Currently, my csv file has 311 rows, with 'cadId', 'cadNomeCompleto', 'cadSexo', 'cadDtNascimento', and 'cadProfissao'. What I want is m additional columns, where m is the maximum number of dictionaries with the key 'funDes' for those 311 dictionaries, such that each cell contains the value of 'funDes' when such value exists, and '-88' when it doesn't exist (when it's either null or the maximum number of dictionaries with 'funDes' is lower than m, for some of those 311 dictionaries).

Can someone give some advice on how to work this out?

martineau
  • 119,623
  • 25
  • 170
  • 301
  • You can use the following: [How to flatten nested JSON recursively, with flatten_json?](https://stackoverflow.com/questions/58442723/how-to-flatten-nested-json-recursively-with-flatten-json) with `df = pd.DataFrame([flatten_json(x) for x in data['RegistoBiografico']['RegistoBiograficoList']['pt_ar_wsgode_objectos_DadosRegistoBiograficoWeb']])`. This will return the flattened JSON in a pandas dataframe. I tested it, so I know it works. Then you can get your columns of interest. – Trenton McKinney Oct 21 '19 at 17:16
  • You may also want `pd.set_option('display.max_columns', 700)` – Trenton McKinney Oct 21 '19 at 17:24
  • Thank you for your reply. This helps me a lot. However, now I have two additional problems, which I was trying to avoid by using ````DictWriter````. ````df```` is of class ````pandas.core.frame.DataFrame````, which requires something like ````df.to_csv('json_test.csv')```` to export to a csv file. However, when I do so, (1) the information appears all mixed up in a few cells and collumns, instead of the organizeed 311rx633cols, and (2) some names aren't exported properly - e.g. ADÃO JOSÉ FONSECA SILVA instead of ADÃO JOSÉ FONSECA SILVA. What should I do differently here? – João Jerónimo Oct 21 '19 at 23:19
  • Maybe update pandas. I'm on v25.2 I saved to_csv and brought it back with read_csv and, at least for that name, it imported as expected (e.g. `ADÃO JOSÉ FONSECA SILVA`). A quick look at the csv, shows accented characters as they should be. The df shows 633 columns by 311 rows as expected. – Trenton McKinney Oct 21 '19 at 23:38
  • My version was, indeed, unupdated, but the problem persists - no accentuation and not one key per column. I now tried ````df.to_csv('bio_test.csv', encoding='iso-8859-1', sep=';')````, and python managed to correctly fill 5 rows - right accentuation and 1 key per column - before raising the following error: ````UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' in position 3434: ordinal not in range(256)````. Is there some ````encoding```` value to deal with this? I'm very very new to python, so I don't even fully understand what the problem is... – João Jerónimo Oct 22 '19 at 09:23
  • Ok, i managed to do it with ````ExcelWriter```` :) :) thank you once again for all your help! – João Jerónimo Oct 22 '19 at 09:46

0 Answers0