Extract DOI from IEEEXplore website using python code

Question

Unable to extract field data from the web page, it is not a common web scraping problem. It associated with the javascript as well. I tried with python-requests as well, but unable to solve the problem.

I am trying to extract doi from the webpage. The doi is lying within the javascript. I am able to read the page and the code works up to{print(soup)}. When I am trying to extract the doi value ( in the given code, for the example webpage the doi is as follow: "doi":"10.1109/LAWP.2014.2364296" ) I wanted to print "10.1109/LAWP.2014.2364296" which is extracted from the webpage.

import urllib
from bs4 import BeautifulSoup
web_page = 'https://ieeexplore.ieee.org/document/6933872'
page = urllib.request.urlopen(web_page)
soup = BeautifulSoup(page, 'html.parser')        
print(soup)
soup.body.findAll(text='doi')

When using webpage "https://ieeexplore.ieee.org/document/6933872" the output is 10.1109/LAWP.2014.2364296. How I can?

Check out https://html.python-requests.org/ it has full javascript support — liamhawkins, Feb 09 '19 at 00:34
Possible duplicate of [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) — liamhawkins, Feb 09 '19 at 00:35
I go through the [link]( https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) but it is different. the doi is different for each paper, and only extract that value — Nikita Sharma, Feb 09 '19 at 00:38
When executing the line r.html.render(), it create the error. Any other way? — Nikita Sharma, Feb 09 '19 at 00:52
You should be submitting new questions for specific programming related issues you are encountering with this new direction. — liamhawkins, Feb 11 '19 at 18:19

score 1 · Answer 1 · answered Feb 11 '19 at 01:52

A possible solution that just skips over the Javascript web scraping issue is to use the IEEE API (https://developer.ieee.org/ ). While they do require registration and approval to get an API key, once you have it it will be much easier to send in a bunch of IEEE article numbers and get back their DOIs and other metadata in a structured way.

Extract DOI from IEEEXplore website using python code

1 Answers1