0

I am dealing with the yelp reviews dataset and trying to convert it from Json to csv based on the sample code from github: https://github.com/lovingawareness/Yelp-Challenge-Dataset/blob/master/convert.py but it fails and I cannnot figure it out. Could you please help me with it? Really appreciate.

import os
os.chdir(r"C:\Users\jiang\OneDrive\Desktop\yelp_dataset")

import json
import pandas as pd
from glob import glob

def convert(x):
    ''' Convert a json string to a flat python dictionary
    which can be passed into Pandas. '''
    ob = json.loads(x)
    for k, v in ob.items():
        if isinstance(v, list):
            ob[k] = ','.join(v)
        elif isinstance(v, dict):
            for kk, vv in v.items():
                ob['%s_%s' % (k, kk)] = vv
            del ob[k]
    return ob

for json_filename in glob('yelp_academic_dataset_review.json'):
    csv_filename = '%s.csv' % json_filename[:-5]
    print('Converting %s to %s' % (json_filename, csv_filename))
    df = pd.DataFrame([convert(line) for line in file(json_filename)])
    df.to_csv(csv_filename, encoding='utf-8', index=False)

enter image description here

After changing 'file' into 'open', it still produces a large error as the figure shows below. enter image description here

This problem has been addressed after taking advice from Michael Butscher, just need to revise one line in the last chunk. Really appreciate it.

for json_filename in glob('yelp_academic_dataset_review.json'):
    csv_filename = '%s.csv' % json_filename[:-5]
    print('Converting %s to %s' % (json_filename, csv_filename))
    df = pd.DataFrame([convert(line) for line in open(json_filename, encoding='utf-8')])
    df.to_csv(csv_filename, encoding='utf-8', index=False)

0 Answers0