0

I am trying to write a program that scraps for the IUPACcondensed on this webpage .

Here G03307GF is the ID. I need this:

HexNAc(b1-?)[Fuc(a1-?)]GlcNAc(b1-2)Man(a1-3)[HexNAc(b1-?)[Fuc(a1-?)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc

I tried to use selenium for this.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--headless')
driver = webdriver.Chrome('', options = chrome_options)

# takes accession number and returns IUPAC
def getIUPAC(acc_no):

    url = 'https://glytoucan.org/Structures/Glycans/' + acc_no

    driver.get(url)
    IUPAC = driver.find_element_by_xpath('//*[@id="descriptors"]/togostanza-iupaccondensed//main/div/pre/code/text()')
    driver.close()

    return IUPAC

IUPAC = getIUPAC('G37498VS')

print(IUPAC)

It says the the element does not exist.

Smart Manoj
  • 5,230
  • 4
  • 34
  • 59
Shaurya
  • 3
  • 3

2 Answers2

2
import re
import requests

def getIUPAC(acc_no):
    ret = requests.get('https://glytoucan.org/Structures/Glycans/{}'.format(acc_no))
    z = re.search('<meta name="description".*?The IUPAC representation is (.+)\.\s+The', ret.content, re.DOTALL | re.MULTILINE)
    return z if z else 'Unknown'


print('IUPAC is {}'.format(getIUPAC('G03307GF')))

Our result is...

IUPAC is HexNAc(b1-?)[Fuc(a1-?)]GlcNAc(b1-2)Man(a1-3)[HexNAc(b1-?)[Fuc(a1-?)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
VeNoMouS
  • 314
  • 2
  • 5
  • That worked! Thanks. I did not use 're' because there was some issue with bytes but the 'requests' thing worked. – Shaurya May 27 '19 at 21:21
0

Better use requests as shown by VeNoMouS. Just wanted to add that you're getting element does not exist because the driver was closed before you printed it.

Alan
  • 71
  • 3