0

I am trying to use Node.js to implement Data Scrawling. I used axios to GET HTML file and use cheerio to get data.

However, I found that the HTML doesn't return with data but only layout. I guess the website with load the layout first, then doing ajax things to query data then rendering.

So, Anyone know how to GET the full HTML with data? Any library or tools?

Thanks.

Chester
  • 19
  • 2
  • Possible duplicate of [How can I scrape pages with dynamic content using node.js?](https://stackoverflow.com/questions/28739098/how-can-i-scrape-pages-with-dynamic-content-using-node-js) – t.niese Feb 06 '19 at 06:23
  • There are various questions about that topic here on StackOverflow. And also many sites targeting exactly that topic. All have in common that they use/suggest a fully featured browser engine to load the page. – t.niese Feb 06 '19 at 06:26

1 Answers1

0

i would suggest you to use selenium library with bs4 library in python if have some experience on python.

for node

https://www.npmjs.com/package/selenium-webdriver

i have written scraper in python using both library.

scraper is for linked in profile which take name from excel file and search if data available add it into another excel file

https://github.com/harsh4870/Scraper_LinkedIn

for node code goes like

    driver = webdriver.Firefox();
driver.get("http://example.com");
html = driver.getPageSource();
Harsh Manvar
  • 27,020
  • 6
  • 48
  • 102