Python 3 - Load JSON data into my .csv -file

Question

I actually try to write JSON in a data.csv file. I tried following solution from stackoverflow: How do I write a Python dictionary to a csv file?

So I come up with these:

with open("data/dataGold.csv", 'w') as f:
    w = csv.DictWriter(f, ['data']['user']['repositories']['nodes'], extrasaction='ignore')
    w.writeheader()
    w.writerow(response)
    w.writerow([data['data']['user']['repositories']['nodes']['name'],
              data['data']['user']['repositories']['nodes']['forkCount'],
              data['data']['user']['repositories']['nodes']['issues']])

My response variable of type 'dict' is:

{'data': {'user': {'name': 'Markus Goldstein',
                   'repositories': {'nodes': [{'forkCount': 0,
                                               'issues': {'totalCount': 0},
                                               'name': 'repache'},
                                              {'forkCount': 4,
                                               'issues': {'totalCount': 3},
                                               'name': 'nf-hishape'},
                                              {'forkCount': 4,
                                               'issues': {'totalCount': 7},
                                               'name': 'ip-countryside'},
                                              {'forkCount': 42,
                                               'issues': {'totalCount': 29},
                                               'name': 'bonesi'},
                                              {'forkCount': 13,
                                               'issues': {'totalCount': 3},
                                               'name': 'rapidminer-anomalydetection'},
                                              {'forkCount': 0,
                                               'issues': {'totalCount': 0},
                                               'name': 'rapidminer-studio'}]}}}}

There is a TypeError that says no indices allowed. I think that is because I used ['data']['user']['repositories']['nodes'].

The solution I posted the link above works becaus no nested Dict/JSON. So I have no idea how to do in my case with the nested Dict/JSON

So my goal is a CSV that contains name, forkCount and issues as header. And the next lines are the values of the different repo's.

May someone can help me and sorry for my bad english -.- Thank You!

score 0 · Answer 1 · answered Jan 06 '18 at 18:25

Markus = {'data': {'user': {'name': 'Markus Goldstein',
                   'repositories': {'nodes': [{'forkCount': 0,
                                               'issues': {'totalCount': 0},
                                               'name': 'repache'},
                                              {'forkCount': 4,
                                               'issues': {'totalCount': 3},
                                               'name': 'nf-hishape'},
                                              {'forkCount': 4,
                                               'issues': {'totalCount': 7},
                                               'name': 'ip-countryside'},
                                              {'forkCount': 42,
                                               'issues': {'totalCount': 29},
                                               'name': 'bonesi'},
                                              {'forkCount': 13,
                                               'issues': {'totalCount': 3},
                                               'name': 'rapidminer-anomalydetection'},
                                              {'forkCount': 0,
                                               'issues': {'totalCount': 0},
                                               'name': 'rapidminer-studio'}]}}}}

with open('Markus.csv', 'w') as markus:
    print ('name,forkCount,issues', file=markus)
    for node in Markus['data']['user']['repositories']['nodes']:
        print ('{},{},{}'.format(node['name'], node['forkCount'], node['issues']['totalCount']), file=markus)

The first print statement outputs the header line to the csv file.
The for loop arranges to unpack items from the dictionary.
The second print statement arranges to output each unpacked item to the csv file.

The result is this.

name,forkCount,issues
repache,0,0
nf-hishape,4,3
ip-countryside,4,7
bonesi,42,29
rapidminer-anomalydetection,13,3
rapidminer-studio,0,0

Thank you too. It works also, but I prefer the accepted solution :) — E.Lmo, Jan 06 '18 at 19:16

rohit keshav · Accepted Answer · 2018-01-06T19:21:27.180

Given below should work fine,

1] The extra loop that I have written changes your structure to one that removes the dictionary under issues and stores value of total_counts under issues so that the CSV comes out clean.

2] I am using deepcopy here because I do not want to modify the original data structure, thus instead of using a reference i am using deepcopy of it.

3] Type casting wt_csv[0].keys() to list as .keys() function returns dict_keys in python 3 and not a list

import csv
import json
import copy

i_dict = {'data': {'user': {'name': 'Markus Goldstein',
                           'repositories': {'nodes': [{'forkCount': 0,
                                                       'issues': {'totalCount': 0},
                                                       'name': 'repache'},
                                                      {'forkCount': 4,
                                                       'issues': {'totalCount': 3},
                                                       'name': 'nf-hishape'},
                                                      {'forkCount': 4,
                                                       'issues': {'totalCount': 7},
                                                       'name': 'ip-countryside'},
                                                      {'forkCount': 42,
                                                       'issues': {'totalCount': 29},
                                                       'name': 'bonesi'},
                                                      {'forkCount': 13,
                                                       'issues': {'totalCount': 3},
                                                       'name': 'rapidminer-anomalydetection'},
                                                      {'forkCount': 0,
                                                       'issues': {'totalCount': 0},
                                                       'name': 'rapidminer-studio'}]}}}}


wt_csv = copy.deepcopy(i_dict['data']['user']['repositories']['nodes'])

for wc in wt_csv:
    wc['issues'] = wc['issues']['totalCount']

with open('dataGold.csv', 'w') as output_file:
    dict_writer = csv.DictWriter(output_file, fieldnames=list(wt_csv[0].keys()))
    dict_writer.writeheader()
    dict_writer.writerows(wt_csv)

Let me know in comments if something is unclear.

That actually works perfect :) Thank you :) And because of your clear description, I understood all. — E.Lmo, Jan 06 '18 at 19:15

score 0 · Answer 3 · answered Jan 08 '18 at 17:15

so considering you're analyzing RapidMiner usage, you may also choose to just use RapidMiner Text Processing:

here's the XML:

<?xml version="1.0" encoding="UTF-8"?>
<process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
        <parameter key="text" value="{&#10;  &quot;data&quot;: {&#10;    &quot;user&quot;: {&#10;      &quot;name&quot;: &quot;Markus Goldstein&quot;,&#10;      &quot;repositories&quot;: {&#10;        &quot;nodes&quot;: [&#10;          {&#10;            &quot;forkCount&quot;: 0,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 0&#10;            },&#10;            &quot;name&quot;: &quot;repache&quot;&#10;          },&#10;          {&#10;            &quot;forkCount&quot;: 4,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 3&#10;            },&#10;            &quot;name&quot;: &quot;nf-hishape&quot;&#10;          },&#10;          {&#10;            &quot;forkCount&quot;: 4,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 7&#10;            },&#10;            &quot;name&quot;: &quot;ip-countryside&quot;&#10;          },&#10;          {&#10;            &quot;forkCount&quot;: 42,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 29&#10;            },&#10;            &quot;name&quot;: &quot;bonesi&quot;&#10;          },&#10;          {&#10;            &quot;forkCount&quot;: 13,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 3&#10;            },&#10;            &quot;name&quot;: &quot;rapidminer-anomalydetection&quot;&#10;          },&#10;          {&#10;            &quot;forkCount&quot;: 0,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 0&#10;            },&#10;            &quot;name&quot;: &quot;rapidminer-studio&quot;&#10;          }&#10;        ]&#10;      }&#10;    }&#10;  }&#10;}"
        />
      </operator>
      <operator activated="true" class="text:json_to_data" compatibility="7.5.000" expanded="true" height="82" name="JSON To Data" width="90" x="179" y="34" />
      <connect from_op="Create Document" from_port="output" to_op="JSON To Data" to_port="documents 1" />
      <connect from_op="JSON To Data" from_port="example set" to_port="result 1" />
      <portSpacing port="source_input 1" spacing="0" />
      <portSpacing port="sink_result 1" spacing="0" />
      <portSpacing port="sink_result 2" spacing="0" />
    </process>
  </operator>
</process>

Python 3 - Load JSON data into my .csv -file

3 Answers3