I am trying to create a list with all the newspapers articles from 5 different sources. They are stored in JSON
format. All articles are stored in different files that contain that contain the newspaper and the year (time spam 2005-2015). The problem is that one of the newspapers is available for only 2014-15, therefore when I loop everything together I get error. This is my attempt:
import json
import nltk
import re
import pandas
appended_data = []
for i in range(2005,2016):
df0 = pandas.DataFrame([json.loads(l) for l in open('SDM_%d.json' % i)])
df1 = pandas.DataFrame([json.loads(l) for l in open('Scot_%d.json' % i)])
df2 = pandas.DataFrame([json.loads(l) for l in open('APJ_%d.json' % i)])
df3 = pandas.DataFrame([json.loads(l) for l in open('TH500_%d.json' % i)])
df4 = pandas.DataFrame([json.loads(l) for l in open('DRSM_%d.json' % i)])
appended_data.append(df0)
appended_data.append(df1)
appended_data.append(df2)
appended_data.append(df3)
appended_data.append(df4)
appended_data = pandas.concat(appended_data)
doc_set = appended_data.body
My question is; does this code does what I am aiming? (creating a single list with the body
of all articles from each newspaper along time); and, how can I program it in a way that I skip the years 2005-2013 for the first newspaper (SDM)