0

I want to write program which will open connection on a page, example "https://en.wikipedia.org" and will get all requested URLs which is initiated by "https://en.wikipedia.org" page to load content from server.

I mean, when you open chrome development tools -> network, you can see all network requests initiated by the current page to load content from server, can I get this requests in my java or C# program? and how?

I looked some utils like "jsoap", but it seems that they all works only for source code parsing.

JiboOne
  • 1,438
  • 4
  • 22
  • 55

1 Answers1

0

First of all you have to parse the whole html file which you get from the server. For instance if you request https://en.wikipedia.org/wiki/Main_Page you should extract the following elements from the HTML file:

  • all referenced HTML-Sites inside a <a ... </a>-Tag
  • all favicons inside the <meta-Tags
  • all stylesheets and script includes from the top and bottom of the file

Finally, you get all references from the site https://en.wikipedia.org/wiki/Main_Page and despite this you could also see this mechanics as a URL-Tree Walker.

If you got any questions about the implementation, please ask me, because I've done a similar implementation on my last student project.

theexiile1305
  • 417
  • 5
  • 18
  • M.Fuchs, thanks for answer, please show some code snipet of implementation? By the way, "MyPage" builds some ajax requests using javascript, gets some tokens and generates urls and etc... Page source does not conteins target url in tags. – JiboOne Jul 29 '18 at 04:54