-1

I have a ton of dicts that I have converted from twitter JSON data. Now, I want to turn them into one .csv file. I searched the site but the solutions seem to fit dicts with very few values or dicts that already exist. In my case the number of keys is a little higher, and I also have to go through an iterative process to turn each JSON file to a dict. In other words, I want to write each of my JSON files on my .csv file as soon as I turn them into a dict file in an iterative process.

Here's my code so far:

json_path = "C://Users//msalj//OneDrive//Desktop//pypr//Tweets"
for filename in os.listdir(json_path): 
    with open(filename, 'r') as infh:
        for data in json_parse(infh):

and here is a sample of my converted JSON files:

{'actor': {'displayName': 'RIMarkable',
           'favoritesCount': 0,
           'followersCount': 0,
           'friendsCount': 0,
           'id': 'id:twitter.com:3847371',
           'image': 'Picture_13.png',
           'languages': ['en'],
           'link': 'ht........ble',
           'links': [{'href': 'htt.....m', 'rel': 'me'}],
           'listedCount': 0,
           'objectType': 'person',
           'postedTime': '2007-01-09T02:53:35.000Z',
           'preferredUsername': 'RIMarkable',
           'statusesCount': 0,
           'summary': 'The Official, Unofficial BlackBerry Weblog',
           'twitterTimeZone': 'Eastern Time (US & Canada)',
           'utcOffset': '0',
           'verified': False},
 'body': 'Jim Balsillie To Present At JP Morgan Technology Conference: Research in Motion co-CEO, Jim Balsillie,.. ht...qo',
 'generator': {'displayName': 'twitterfeed', 'link': 'htt......om'},
 'gnip': {'matching_rules': [{'tag': None, 'value': '"JP Morgan"'}]},
 'id': 'tag:search.twitter.com,2005:66178882',
 'link': 'ht...82',
 'object': {'id': 'object:search.twitter.com,2005:66178882',
            'link': 'ht.....82',
            'objectType': 'note',
            'postedTime': '2007-05-16T19:00:24.000Z',
            'summary': 'Jim Balsillie To Present At JP Morgan Technology Conference: Research in Motion co-CEO, Jim Balsillie,.. ht......qo'},
 'objectType': 'activity',
 'postedTime': '2007-05-16T19:00:24.000Z',
 'provider': {'displayName': 'Twitter',
              'link': 'ht......m',
              'objectType': 'service'},
 'retweetCount': 0,
 'twitter_entities': {'hashtags': [],
                      'urls': [{'expanded_url': None,
                                'indices': [105, 130],
                                'url': 'htt.......5qo'}],
                      'user_mentions': []},
 'verb': 'post'}

Can anybody help me with its coding? Thanks a lot!

brandizzi
  • 26,083
  • 8
  • 103
  • 158
Mike Sal
  • 197
  • 1
  • 4
  • 13
  • The provided json isn't valid so it's hard to code your example. Please verify it and post it inside a code block – Schalton Sep 02 '18 at 17:41
  • What I provided is not JSON anymore. I have turned my JSON files to dict type. – Mike Sal Sep 02 '18 at 17:49
  • I tried to copy and paste it into my ide and python didn't recognize it as a dictionary or json – Schalton Sep 02 '18 at 17:51
  • Your question depends on which information you want in your CSV file. Which data from the dict you want? Which columns will be in the CSV file? – brandizzi Sep 04 '18 at 17:29
  • Possible duplicate of [How can I convert JSON to CSV?](https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv) – brandizzi Sep 04 '18 at 17:30

1 Answers1

0

With various depths, if you want to keep everything, this problem gets a little more complicated.

What I've done with this issue is flattened the dictionary.

def flatten_dict(input_dict):
    flat_dict = {}
    for k,v in input_dict.items():
        if isinstance(v, dict):
            for k2, v2 in flatten_dict.items():
                flat_dict[k2] = v2
        elif any([isinstance(v, c_type) for c_type in [list, tuple]]):
            for index, i in enumerate(v):
                 flat_dict["{}-{}".format(k, index)] = i
        elif any([isinstance(v, c_type) for c_type in [str, int, float]]):
            flat_dict[k] = v
        else:
            print("unknwon type, add handling for: {}".format(type(v)))
    return flat_dict

then I'll use the first json instance to create a header row:

header_row = [k for k in flatten_dict(row1)]

and print the header row to the csv

",".join(header_row)

and print the data in the same order for each json row afterwards:

for row in rows:
    flat_row = flatten_dict(row)
    print_row = ",".join([flat_row[header] if header in flat_row else "" for header in header_row])
Schalton
  • 2,867
  • 2
  • 32
  • 44