0

I need to download data feed from this website:

http://www.oddsportal.com/soccer/argentina/copa-argentina/rosario-central-racing-club-hnmq7gEQ/

In Chrome using developer tools I was able to find this link

http://fb.oddsportal.com/feed/match/1-1-hnmq7gEQ-1-2-yj45f.dat

which contains everything I need. Question is how to programmatically (preferably in java) get to the second link when I know the first.

Thanks in advance for any useful help.

Kai
  • 5,850
  • 13
  • 43
  • 63
Josef Ondrej
  • 159
  • 8
  • you just intent to download the website's source code? if so, see this: http://stackoverflow.com/questions/238547/how-do-you-programmatically-download-a-webpage-in-java – Kai Oct 27 '15 at 14:58
  • @Kai No, I can download the source code. I just need to download external source files of the website, specifically the .dat file I mentioned. I can find this file in chrome manually, but I have no idea how to do it programmatically. – Josef Ondrej Oct 27 '15 at 15:02

2 Answers2

0

You can use a framework such as JSoup in Java and scrape a page.

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();

Once you have this you can then query the links on that page and save them to an array:

Elements links = doc.select("a[href]");

Then run though this array and follow them links.

for (Element link : links) {
   Document doc = Jsoup.connect(link.attr("abs:href")).get();
}
Dean Meehan
  • 2,511
  • 22
  • 36
  • I can do that, but the link I need to find isn't in the source code. I think it is somehow loaded with javascript(?) but I don't know how exactly. – Josef Ondrej Oct 27 '15 at 15:08
  • Look at Rhino as documented here http://stackoverflow.com/questions/2670082/web-crawler-that-can-interpret-javascript – Dean Meehan Oct 27 '15 at 16:34
0

This is quite similar to this issue. You can use that to get a String with all the sources. Then you just search the string to find what you're looking for. It can look like this.

First start ChromeDriver and navigate to the page you wish to scrap.

WebDriver driver = new ChromeDriver();
driver.get("http://www.oddsportal.com/soccer/argentina/copa-argentina/rosario-central-racing-club-hnmq7gEQ/");

Then download the sources into a string

String scriptToExecute = "var performance = window.performance || window.mozPerformance || window.msPerformance || window.webkitPerformance || {}; var network = performance.getEntries() || {}; return network;";
String netData = ((JavascriptExecutor) driver).executeScript(scriptToExecute).toString();

And finally search the string for the desired link

netData = netData.substring(netData.indexOf("fb.oddsportal"), netData.indexOf(".dat")+4);       
System.out.println(netData);
Community
  • 1
  • 1
  • That's exactly what I was looking for. I'm just wondering if it would be possible to do this without loading the whole website (which slows the whole process). – Josef Ondrej Oct 27 '15 at 16:27