I'm stuck in a conundrum of optimization versus the nature of the program. I have code that's written to extract info from an API and insert it directly into a MongoDB database. I've posted code that is only operating on 4 pages of the API, and it works rather quickly. However, the final program needs to works reasonably well on 40 pages and as of now the program seems to stop after 5. To be clear, It says its completed, but has only collected from 5. To ensure the right information is placed with the right 'collection', which are named from the extraction itself and not manually, the code is built on a serious of nested for loops that are quite slow and pretty hideous to behold. However, I've been whacking at this for a while and I'm having trouble coming up with any other way to do it that gathers the information accurately and puts it in the right place. Again, looking to reduce the number of nested loops. My API key is blocked, so this code will not run. The API is NCBO's BioPortal and you can look at their API here: http://data.bioontology.org/
Thanks!
import urllib2
import json
import ast
from pymongo import MongoClient
from datetime import datetime
REST_URL = "http://data.bioontology.org"
API_KEY = "********"
client=MongoClient()
db=client.db
print "Accessed database."
def get_json(url):
opener = urllib2.build_opener()
opener.addheaders = [('Authorization', 'apikey token=' + API_KEY)]
return json.loads(opener.open(url).read())
# Get all ontologies from the REST service and parse the JSON
all_ontologies = get_json(REST_URL+"/ontologies")
selected_ontologies= ['MERA','OGROUP','GCO','OCHV']
onts_acronyms=[]
page=None
acronym= None
for ontology in all_ontologies:
if ontology["acronym"] in selected_ontologies:
onts_acronyms.append(ast.literal_eval(json.dumps(ontology["acronym"]))) #cleans names and removes whitespaces using ast package
for acronym in onts_acronyms:
page=get_json(REST_URL+"/ontologies/"+acronym+"/classes")
next_page=page
while next_page:
next_page=page["links"]["nextPage"]
for ont_class in page["collection"]:
result = db[acronym].insert({ont_class["prefLabel"]:
{"definition":ont_class["definition"],"synonyms":ont_class["synonym"]}},
check_keys=False)
if next_page:
page=get_json(next_page)
print "DB Built."