Getting text from website from a pandas python

Question

I have a list of reaction names from which I want to make a Search in ModelSeed (basically "https://modelseed.org/biochem/reactions/" + reaction name). Then I want to know the KEGG pathway for the given name.

For instance, for the reaction "rxn00020", the function would go to https://modelseed.org/biochem/reactions/rxn00020 and from there give me "KEGG: rn00500 (Starch and sucrose metabolism)". I tried following this thread but didn't manage to get anything done... Can you help me? Thanks a lot!

Sushil · Accepted Answer · 2020-10-26T11:54:18.883

The page contents are loaded dynamically, so you have to use selenium in order to scrape them. Here is how you do it:

from selenium import webdriver
import time

driver = webdriver.Chrome()

urls = ['https://modelseed.org/biochem/reactions/rxn00020'] #List of all your urls

for url in urls:
    driver.get(url)
    time.sleep(1.5)
    kegg = driver.find_elements_by_class_name('ng-binding')[-2]
    print(kegg.text)

Output:

KEGG: rn00500 (Starch and sucrose metabolism)

score 0 · Answer 2 · answered Oct 26 '20 at 11:59

0

Res contains what u want. Take a look to Network Tab from your Web Inspector.

The data you want, transits through XHR requests.

import requests as rq

reaction_names = ["rxn00020", ]
res = {}
base_url = "https://modelseed.org/solr/reactions/select?wt=json&q=id:"

for reac_name in reaction_names:
    resp = rq.get(base_url + reac_name).json()
    res[reac_name] = resp['response']['docs'][0]['pathways']

answered Oct 26 '20 at 11:59

ce.teuf

746
6
13

Hi! thanks for the help but I'm afraid it did not work for me. Anyway, the problem is solved! – Tor Tor Oct 26 '20 at 15:19

Getting text from website from a pandas python

2 Answers2