I'm trying to load 10 dependent directories, which contains a bunch of JSON files, the structure is shown below:
for fpathe1,dirs1,fs1 in os.walk('../input/charliehebdo/rumours/'):
for f in fs1:
with open(os.path.join(fpathe1,f)) as dir_loc:
data.append(json.loads(dir_loc.read()))
charliehebdo = pd.DataFrame(data)
charliehebdo['label'] = 'TRUE'
charliehebdo['event'] = 'charliehebdo'
for fpathe2,dirs2,fs2 in os.walk('../input/charliehebdo/non-rumours/'):
for f in fs2:
with open(os.path.join(fpathe2,f)) as dir_loc:
data.append(json.loads(dir_loc.read()))
nonRumourcharliehebdo = pd.DataFrame(data)
nonRumourcharliehebdo['label'] = 'FALSE'
nonRumourcharliehebdo['event'] = 'charliehebdo'
for fpathe3,dirs3,fs3 in os.walk('../input/ferguson/rumours/'):
for f in fs3:
with open(os.path.join(fpathe3,f)) as dir_loc:
data.append(json.loads(dir_loc.read()))
ferguson = pd.DataFrame(data)
ferguson['label'] = 'TRUE'
ferguson['event'] = 'ferguson'
for fpathe4,dirs4,fs4 in os.walk('../input/ferguson/non-rumours/'):
for f in fs3:
with open(os.path.join(fpathe3,f)) as dir_loc:
data.append(json.loads(dir_loc.read()))
nonRumourferguson = pd.DataFrame(data)
nonRumourferguson['label'] = 'FALSE'
nonRumourferguson['event'] = 'ferguson'
However, the sample code is extremely time-consuming(I ran on my laptop with Intel Core i7-4720HQ and it cost me 24hr+) so I'm wondering if there's any better solution?
well, it seems that my structure figure confuse or mislead you so here is the dataset.raw dataset
I intended to illustrate the dataset by figure but it turns out to be worse.