0

I'm trying to get the text from the following html code: <span id="numrefs">(0)</span> My code is as follows:

container = soup.findAll("span", {"id":"numrefs"})
print(cotainer)
print(cotainer[0].text)

I think I should be able to get the text, which is (0), using the second print. But the result of my code is as follows:

[<span id="numrefs"></span>]

I could not get any text in the span tag. The second print function returned nothing here. What is happening here?

  • 1
    The information you have provided so far is not enough to tell you why your output is missing desired values? It may be the content of that site generates dynamically. – SIM May 31 '18 at 11:12
  • yes, this number is actually the number of cited times of an academic paper. So, I assume it should be dynamic although I don't know much about html. Is it possible to scrape this sort of data? – zihao zhao May 31 '18 at 11:16
  • Sure, there are always alternatives. In fact if you check out the bounty questions, you can have several answers there as to how you can play with dynamic sites. Check out [this one](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) – SIM May 31 '18 at 11:18
  • Thank you. I checked that answer. It seems that there was a controlling script in that case. In my case, the text (0) is originally in the span tag. This is the webpage I am trying to scraping. [link](https://www.osapublishing.org/ol/abstract.cfm?uri=ol-43-1-1) What I am trying to scape is the cited number on the left side of the page. Do you think this is the same as in that case? – zihao zhao May 31 '18 at 11:41
  • 1
    I could not find any value of that id `numrefs` in that webpage. Check out [this link](https://www.dropbox.com/s/q4zvpt09pus04na/Untitled.jpg?dl=0) – SIM May 31 '18 at 12:13
  • that's interesting, maybe it's because of the browser. Anyway, I think it should be the dynamic data thing. I will do my research on this. Thank you! – zihao zhao May 31 '18 at 13:30
  • i think i found the reason for this. This cited number is only available to the subscribers only. If you are not a subscriber, you will not see those numbers. I just realized this after I came back home. I can see those numbers using my university Internet access.However, I can still not scrape those data even if i'm using the university internet. Is this dynamic data? – zihao zhao May 31 '18 at 14:11

0 Answers0