1

I'm trying to web scrape from a site that requires login, i have already achieved that and scraped the website successfully.

What i'm having problem is that the values i'm trying to scrape don't appear until after a few seconds after the site has been loaded, i'm scraping from a Siemens OZW772.04 controller unit and it has to fetch celsius degrees and it takes a few seconds before it has loaded them all.

So after i have scraped the website the text is alright, but all the values are like this

Datapoint   Value       
HN1 heat-in \n
---
 \n     \n

 \n 
 HN2 room-temp \n
 ---
 \n      \n

 \n 

Where \n are the values i need. I have already tried just letting the thread wait but that doesn't seem to work either.

Ólafur Aron
  • 352
  • 2
  • 12
  • 2
    you're going to have to provide a heck of a lot more details about your implementation in order for us to be able to help you. – Wug Sep 12 '12 at 15:53
  • 2
    If you load it in a browser is there a visible delay after the page renders also? if so make sure the page is not doing an ajax request to fetch the data - what you would also need to scrape - as a lot of these devices seem to prefer – Alex K. Sep 12 '12 at 15:53
  • Infact if the page loads with placeholder values then that's what must be going on, look at the network tab in your browser debugger and see if the page fetches the data independently of itself – Alex K. Sep 12 '12 at 15:56
  • Is there any other way to get to the data? SNMP? OPC? – RQDQ Sep 12 '12 at 15:57
  • 1
    Values that aren't populated in the initial transfer are not populated automagically. More than likely there is some javascript, which polls the update values and inserts them into the content, that must be executed. You should find out if the device has a web service you can poll for your values rather than trying to screenscrape. Also, you need to show some code or at least specify which .NET classes you are using to get the page. – JamieSee Sep 12 '12 at 15:57

1 Answers1

1

If the web page is using javascript to load or edit the html, then you will not see that unless you render the page as a browser would (i.e., using a browser). You can use WebKit and the WebKit.NET library or one of several alternatives.

Community
  • 1
  • 1
schellack
  • 10,144
  • 1
  • 29
  • 33