1

I actually try to write JSON in a data.csv file. I tried following solution from stackoverflow: How do I write a Python dictionary to a csv file?

So I come up with these:

with open("data/dataGold.csv", 'w') as f:
    w = csv.DictWriter(f, ['data']['user']['repositories']['nodes'], extrasaction='ignore')
    w.writeheader()
    w.writerow(response)
    w.writerow([data['data']['user']['repositories']['nodes']['name'],
              data['data']['user']['repositories']['nodes']['forkCount'],
              data['data']['user']['repositories']['nodes']['issues']])

My response variable of type 'dict' is:

{'data': {'user': {'name': 'Markus Goldstein',
                   'repositories': {'nodes': [{'forkCount': 0,
                                               'issues': {'totalCount': 0},
                                               'name': 'repache'},
                                              {'forkCount': 4,
                                               'issues': {'totalCount': 3},
                                               'name': 'nf-hishape'},
                                              {'forkCount': 4,
                                               'issues': {'totalCount': 7},
                                               'name': 'ip-countryside'},
                                              {'forkCount': 42,
                                               'issues': {'totalCount': 29},
                                               'name': 'bonesi'},
                                              {'forkCount': 13,
                                               'issues': {'totalCount': 3},
                                               'name': 'rapidminer-anomalydetection'},
                                              {'forkCount': 0,
                                               'issues': {'totalCount': 0},
                                               'name': 'rapidminer-studio'}]}}}}

There is a TypeError that says no indices allowed. I think that is because I used ['data']['user']['repositories']['nodes'].

The solution I posted the link above works becaus no nested Dict/JSON. So I have no idea how to do in my case with the nested Dict/JSON

So my goal is a CSV that contains name, forkCount and issues as header. And the next lines are the values of the different repo's.

May someone can help me and sorry for my bad english -.- Thank You!

E.Lmo
  • 71
  • 1
  • 8

3 Answers3

0
Markus = {'data': {'user': {'name': 'Markus Goldstein',
                   'repositories': {'nodes': [{'forkCount': 0,
                                               'issues': {'totalCount': 0},
                                               'name': 'repache'},
                                              {'forkCount': 4,
                                               'issues': {'totalCount': 3},
                                               'name': 'nf-hishape'},
                                              {'forkCount': 4,
                                               'issues': {'totalCount': 7},
                                               'name': 'ip-countryside'},
                                              {'forkCount': 42,
                                               'issues': {'totalCount': 29},
                                               'name': 'bonesi'},
                                              {'forkCount': 13,
                                               'issues': {'totalCount': 3},
                                               'name': 'rapidminer-anomalydetection'},
                                              {'forkCount': 0,
                                               'issues': {'totalCount': 0},
                                               'name': 'rapidminer-studio'}]}}}}

with open('Markus.csv', 'w') as markus:
    print ('name,forkCount,issues', file=markus)
    for node in Markus['data']['user']['repositories']['nodes']:
        print ('{},{},{}'.format(node['name'], node['forkCount'], node['issues']['totalCount']), file=markus)
  • The first print statement outputs the header line to the csv file.
  • The for loop arranges to unpack items from the dictionary.
  • The second print statement arranges to output each unpacked item to the csv file.

The result is this.

name,forkCount,issues
repache,0,0
nf-hishape,4,3
ip-countryside,4,7
bonesi,42,29
rapidminer-anomalydetection,13,3
rapidminer-studio,0,0
Bill Bell
  • 21,021
  • 5
  • 43
  • 58
0

Given below should work fine,

1] The extra loop that I have written changes your structure to one that removes the dictionary under issues and stores value of total_counts under issues so that the CSV comes out clean.

2] I am using deepcopy here because I do not want to modify the original data structure, thus instead of using a reference i am using deepcopy of it.

3] Type casting wt_csv[0].keys() to list as .keys() function returns dict_keys in python 3 and not a list

import csv
import json
import copy

i_dict = {'data': {'user': {'name': 'Markus Goldstein',
                           'repositories': {'nodes': [{'forkCount': 0,
                                                       'issues': {'totalCount': 0},
                                                       'name': 'repache'},
                                                      {'forkCount': 4,
                                                       'issues': {'totalCount': 3},
                                                       'name': 'nf-hishape'},
                                                      {'forkCount': 4,
                                                       'issues': {'totalCount': 7},
                                                       'name': 'ip-countryside'},
                                                      {'forkCount': 42,
                                                       'issues': {'totalCount': 29},
                                                       'name': 'bonesi'},
                                                      {'forkCount': 13,
                                                       'issues': {'totalCount': 3},
                                                       'name': 'rapidminer-anomalydetection'},
                                                      {'forkCount': 0,
                                                       'issues': {'totalCount': 0},
                                                       'name': 'rapidminer-studio'}]}}}}


wt_csv = copy.deepcopy(i_dict['data']['user']['repositories']['nodes'])

for wc in wt_csv:
    wc['issues'] = wc['issues']['totalCount']

with open('dataGold.csv', 'w') as output_file:
    dict_writer = csv.DictWriter(output_file, fieldnames=list(wt_csv[0].keys()))
    dict_writer.writeheader()
    dict_writer.writerows(wt_csv)

Let me know in comments if something is unclear.

rohit keshav
  • 305
  • 2
  • 16
  • That actually works perfect :) Thank you :) And because of your clear description, I understood all. – E.Lmo Jan 06 '18 at 19:15
0

so considering you're analyzing RapidMiner usage, you may also choose to just use RapidMiner Text Processing: enter image description here

here's the XML:

<?xml version="1.0" encoding="UTF-8"?>
<process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
        <parameter key="text" value="{&#10;  &quot;data&quot;: {&#10;    &quot;user&quot;: {&#10;      &quot;name&quot;: &quot;Markus Goldstein&quot;,&#10;      &quot;repositories&quot;: {&#10;        &quot;nodes&quot;: [&#10;          {&#10;            &quot;forkCount&quot;: 0,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 0&#10;            },&#10;            &quot;name&quot;: &quot;repache&quot;&#10;          },&#10;          {&#10;            &quot;forkCount&quot;: 4,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 3&#10;            },&#10;            &quot;name&quot;: &quot;nf-hishape&quot;&#10;          },&#10;          {&#10;            &quot;forkCount&quot;: 4,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 7&#10;            },&#10;            &quot;name&quot;: &quot;ip-countryside&quot;&#10;          },&#10;          {&#10;            &quot;forkCount&quot;: 42,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 29&#10;            },&#10;            &quot;name&quot;: &quot;bonesi&quot;&#10;          },&#10;          {&#10;            &quot;forkCount&quot;: 13,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 3&#10;            },&#10;            &quot;name&quot;: &quot;rapidminer-anomalydetection&quot;&#10;          },&#10;          {&#10;            &quot;forkCount&quot;: 0,&#10;            &quot;issues&quot;: {&#10;              &quot;totalCount&quot;: 0&#10;            },&#10;            &quot;name&quot;: &quot;rapidminer-studio&quot;&#10;          }&#10;        ]&#10;      }&#10;    }&#10;  }&#10;}"
        />
      </operator>
      <operator activated="true" class="text:json_to_data" compatibility="7.5.000" expanded="true" height="82" name="JSON To Data" width="90" x="179" y="34" />
      <connect from_op="Create Document" from_port="output" to_op="JSON To Data" to_port="documents 1" />
      <connect from_op="JSON To Data" from_port="example set" to_port="result 1" />
      <portSpacing port="source_input 1" spacing="0" />
      <portSpacing port="sink_result 1" spacing="0" />
      <portSpacing port="sink_result 2" spacing="0" />
    </process>
  </operator>
</process>
Scott Genzer
  • 73
  • 1
  • 4