0

I have a list of reaction names from which I want to make a Search in ModelSeed (basically "https://modelseed.org/biochem/reactions/" + reaction name). Then I want to know the KEGG pathway for the given name.

For instance, for the reaction "rxn00020", the function would go to https://modelseed.org/biochem/reactions/rxn00020 and from there give me "KEGG: rn00500 (Starch and sucrose metabolism)". I tried following this thread but didn't manage to get anything done... Can you help me? Thanks a lot!

Tor Tor
  • 11
  • 1

2 Answers2

1

The page contents are loaded dynamically, so you have to use selenium in order to scrape them. Here is how you do it:

from selenium import webdriver
import time

driver = webdriver.Chrome()

urls = ['https://modelseed.org/biochem/reactions/rxn00020'] #List of all your urls

for url in urls:
    driver.get(url)
    time.sleep(1.5)
    kegg = driver.find_elements_by_class_name('ng-binding')[-2]
    print(kegg.text)

Output:

KEGG: rn00500 (Starch and sucrose metabolism)
Sushil
  • 5,440
  • 1
  • 8
  • 26
0

Res contains what u want. Take a look to Network Tab from your Web Inspector.

The data you want, transits through XHR requests.

import requests as rq

reaction_names = ["rxn00020", ]
res = {}
base_url = "https://modelseed.org/solr/reactions/select?wt=json&q=id:"

for reac_name in reaction_names:
    resp = rq.get(base_url + reac_name).json()
    res[reac_name] = resp['response']['docs'][0]['pathways']
ce.teuf
  • 746
  • 6
  • 13