I am dealing with the yelp reviews dataset and trying to convert it from Json to csv based on the sample code from github: https://github.com/lovingawareness/Yelp-Challenge-Dataset/blob/master/convert.py but it fails and I cannnot figure it out. Could you please help me with it? Really appreciate.
import os
os.chdir(r"C:\Users\jiang\OneDrive\Desktop\yelp_dataset")
import json
import pandas as pd
from glob import glob
def convert(x):
''' Convert a json string to a flat python dictionary
which can be passed into Pandas. '''
ob = json.loads(x)
for k, v in ob.items():
if isinstance(v, list):
ob[k] = ','.join(v)
elif isinstance(v, dict):
for kk, vv in v.items():
ob['%s_%s' % (k, kk)] = vv
del ob[k]
return ob
for json_filename in glob('yelp_academic_dataset_review.json'):
csv_filename = '%s.csv' % json_filename[:-5]
print('Converting %s to %s' % (json_filename, csv_filename))
df = pd.DataFrame([convert(line) for line in file(json_filename)])
df.to_csv(csv_filename, encoding='utf-8', index=False)
After changing 'file' into 'open', it still produces a large error as the figure shows below.
This problem has been addressed after taking advice from Michael Butscher, just need to revise one line in the last chunk. Really appreciate it.
for json_filename in glob('yelp_academic_dataset_review.json'):
csv_filename = '%s.csv' % json_filename[:-5]
print('Converting %s to %s' % (json_filename, csv_filename))
df = pd.DataFrame([convert(line) for line in open(json_filename, encoding='utf-8')])
df.to_csv(csv_filename, encoding='utf-8', index=False)