0

There is one website named "www.localbanya.com", i wanted to grab the HTML information from that site, they list products, the structure of their display is:

  1. First they display some around 8-10 products on page-load, and
  2. later when user scrolls down it generates more products.

Now as this is happening based on javascript, i am not able to get the whole page source using WebClient.

I wanted to know is there any way i can update the page-source while using WebClient class in .net to retrieve whole page information or any other alternative i can use to get the whole page HTML information, at once.

You can refer this for reference localbanya product page

Any help will be a appreciated.

Abbas
  • 4,948
  • 31
  • 95
  • 161

1 Answers1

0

WebClient obviously doesn't run the JavaScript.

So you're going to need some sort of a headless browser to do it.

There are many options for it, though I don't know any C# or .NET implementation.

You may look into Phantom JS and other headless browsers which replicate what a normal browser does and you can write scripts for it.

Also refer to this question: Headless browser for C# (.NET)?

You can also run something like Fiddler to see what requests were made from the page when scrolling down, to reverse engineer how the data is retrieved, and replicate that with a WebClient if possible.

halfer
  • 19,824
  • 17
  • 99
  • 186
Madushan
  • 6,977
  • 31
  • 79