6

I have a project for school to parse web code and use it like a data base. When I tried to down data from (https://www.marathonbet.com/en/betting/Football/), I didn't get it all?

Here is my code:

Document doc = Jsoup.connect("https://www.marathonbet.com/en/betting/Football/").get();
Elements newsHeadlines = doc.select("div#container_EVENTS");

for (Element e: newsHeadlines.select("[id^=container_]")) {
    System.out.println(e.select("[class^=block-events-head]").first().text());
    System.out.println(e.select("[class^=foot-market]").select("[class^=event]").text());
} 

for result you get (this is last of displayed leagues):

Football. Friendlies. Internationals All bets Main bets
1. USA 2. Mexico 16 Apr 01:30 +124 7/5 23/10 111/50 +124

on top of her are all leagues displayed.

Why don't I get full data? Thank you for your time!

Pshemo
  • 122,468
  • 25
  • 185
  • 269
poppytop
  • 341
  • 1
  • 12
  • "i do get some data but not all" which data you didn't get? – Pshemo Apr 15 '15 at 17:26
  • like half of the list. untill Football. Friendlies. Internationals league – poppytop Apr 15 '15 at 17:29
  • Can you show one result you want to get which is skipped? – Pshemo Apr 15 '15 at 17:34
  • For example this is one league: Football. England. League 2 1. Burton Albion 2. Carlisle United 19:45 +83 51/100 13/4 34/5 +83 But i want all of them like this – poppytop Apr 15 '15 at 17:38
  • I can't reproduce your problem. `1. Burton Albion 2. Carlisle United 19:45 +83 51/100 13/4 34/5 +83` is printed fine (with additional `All bets Main bets` text but this seems fine). Please [edit] your question where you will explain what you expect to happen, and what happens instead. – Pshemo Apr 15 '15 at 17:59
  • Could be that the website uses JavaScript to load data, and Jsoup does not support js. You could try to disable javascript in a desktop webbrowser, and see if the site still works correctly. – Jonas Czech Apr 15 '15 at 18:15
  • @JonasCz When you print content of `doc.toString()` (for instance to file since it may be too big for console) you will notice that there is HTML code responsible for generating `Football. England. League 2 1. Burton Albion 2. Carlisle United 19:45 +83 51/100 13/4 34/5 +83` so Jsoup should be able (and is) to find it. For now this question is unclear about what precisely is not working as it should (and why OP thinks that something should work). – Pshemo Apr 15 '15 at 18:18
  • @Pshemo, Was just suggesting this, since this sort of problem of missing data on webpage is usually caused by lack of JS support. The problem must be somewhere else then, and OP should clarify. – Jonas Czech Apr 15 '15 at 18:21

1 Answers1

10

Jsoup has a default body response limit of 2MB. You can change it to whatever you need with maxBodySize(int)

Set the maximum bytes to read from the (uncompressed) connection into the body, before the connection is closed, and the input truncated. The default maximum is 2MB. A max size of zero is treated as an infinite amount (bounded only by your patience and the memory available on your machine).

E.g.:

Document doc = Jsoup.get(url).userAgent(ua).maxBodySize(0).get();

You might like to look at the other options in Connection, on how to set request timeouts, the user-agent, etc.

Jonathan Hedley
  • 10,442
  • 3
  • 36
  • 47