1

I make some kind of crawling program for specific site. I need get html elements. but, how can i get it?

sorry, i don't know exact word(DOM tree?). html elements mean, the real view in browser. in chrome browser, we can use developer tools. and we can see elements. i need it. it is not same as source of html file. my target site, source code is empty div. but see by chrome developer tools - elements, that div has datas which i need.

I already know how to get source file. but, how can i get real(after all process) elements?

I try libCurl, at least it's example code get just source code.

Please Help me!

(I need c++, but c#, java is ok.)

Redwings
  • 540
  • 2
  • 4
  • 12
  • 1
    duplicate ? https://stackoverflow.com/questions/17921697/jsoup-like-html-parser-for-c – willll Jul 13 '18 at 13:30
  • The fact they are not directly there implies that elements are being generated within the JavaScript of the page. You would likely need to actually process the JS given with something like webkit. – DavidBittner Jul 13 '18 at 13:31
  • @DavidBittner , than how can i? can you introduce some library? – Redwings Jul 13 '18 at 13:38
  • @willll is it same question?? you think i need html parser? – Redwings Jul 13 '18 at 13:40
  • 1
    @Redwings I really think your best bet is something such as the QtWebEngine. It's a large dependency, but what you're requesting is honestly very difficult. Not only do you have to parse HTML (which isn't a big deal) you need to spin up a JS VM to manipulate the DOM how the page requests. – DavidBittner Jul 13 '18 at 14:01
  • If you do this yourself you'll end up writing about 90% of a web browser. There are several embedded browser projects you could make use of though (Chromium Embedded, QtWebEngine, etc). – Miles Budnek Jul 13 '18 at 14:03
  • @DavidBittner oh... it's really... sad. thank you for help. can you teach me what is the exact word for 'HTML elements'? i didn't know the name, than i cant use google. – Redwings Jul 13 '18 at 14:11
  • @Redwings you had it right actually! An HTML element is an element in the DOM such as a div, p, a, etc. – DavidBittner Jul 13 '18 at 15:54

0 Answers0