Title pretty much. I click on a button using Selenium that loads in some text that I'm trying to scrape, and when I inspect element, I'm able to find what I'm looking for. However, no matter how long I wait, dumping .page_source or doing driver.execute_script("return document.documentElement.outerHTML") (as suggested here and here) does not reflect the changes to the HTML.
Asked
Active
Viewed 148 times
0
-
Are you sure that the code you need is indeed on the source code of the page? If you can, provide the URL you are working. – ojonasplima Jan 06 '22 at 18:53
-
@osfresia Let's take https://sa.ucla.edu/ro/public/soc/Results?SubjectAreaName=Mathematics+(MATH)&t=22W&sBy=subject&subj=MATH+++ as an example; I click the Expand All Classes button which loads in discussion section times. you can see that this loads this information into the page HTML – failedentertainment Jan 06 '22 at 18:54
-
On that page you used as example, I was able to find the collapsed data that shows on the source with `driver.PageSource.ToString()`. Be aware that the informations that shows on the page are provided by a `JavaScript` code and is not inside the HTML itself, which is holded by the `div id = divClassNames`. The source code does not return the information displayed by the `JavaScript`. – ojonasplima Jan 06 '22 at 19:09
-
@osfresia i'm using the python binding for Selenium, and doing driver.pagesource already creates a string (there is no tostring() method), and I cannot find the collapsed data (say, the class secondarySection which only appears when the button is clicked) in this string. I don't follow what you're saying about JavaScript, could you elaborate please? – failedentertainment Jan 06 '22 at 19:28
-
Sorry about the method, I was testing on C# for a moment. At [this link](https://imgur.com/a/6sBM5JK) you can see the highlighted parts on the Source Code of the page. Note that this part is exactly where your text should be, right? The frontend of the page calls that javascript for a text, so in the source code you can only see the script itself and not the result of that call of the script made by your browser. To scrap that type of data you must use another approach. ++ – ojonasplima Jan 06 '22 at 19:40
-
Read [this article](https://thinkdiff.net/how-to-scrap-data-from-javascript-based-website-using-python-selenium-and-headless-web-driver-531c7fe0c01f) to understand the basics of what you need. I don't know if I should have posted this as a answer but hope that helps. – ojonasplima Jan 06 '22 at 19:41
-
@osfresia https://imgur.com/a/ZLfXyjM if you see here, I can actually see the desired text in the HTML after I click the button. – failedentertainment Jan 06 '22 at 20:59
-
1But that's not in the source code. That's the point. – ojonasplima Jan 06 '22 at 21:33
-
@osfresia maybe i don't understand; what exactly do you mean by source code? I can see the text I'm trying to grab when I inspect element, is that not the "source code"? – failedentertainment Jan 07 '22 at 20:19
-
No, it's not. Click with the right button on the webpage > Show source code. The code that shows at your screen is what the `.PageSource` will extract. Read the article that I referenced before. Can't extend discussion here because of guidelines. – ojonasplima Jan 07 '22 at 20:23
-
@osfresia thanks for the help. I am not seeing anywhere in Selenium's documentation that .page_source returns the original source code and not the current DOM. quotes from the old versions of the documentation seem to indicate that .page_source is unreliable and often not up to date, but doing driver.execute_script("return document.documentElement.outerHTML") should return the current DOM and has the exact same issue for me. Elements that appear when manually searching in inspect element do not show up in the returned string. – failedentertainment Jan 07 '22 at 23:55