I am using Jsoup to extract data by zip codes from a Web site.The zip codes are read from a text file and the results are written at the console. I have around 1500 zip codes. The program throws two kinds of exceptions:
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=500, URL=http://www.moving.com/real-estate/city-profile/...
java.net.SocketTimeoutException: Read timed out
I thought the solution is to read only few data at the time. So, I used a counter, to count 200 zip codes from the text file and I stop the program for 5 minutes after I have data for 200 zip codes. As I said, I still have the exceptions. So far, when I see the exception, I copy paste the available data, and I continue after that with the following zip codes. But I want to read all data without interruptions. Can be this possible? Any hint will be appreciated. Thank you in advance!
This is my code for reading all data:
while (br.ready())
{
count++;
String s = br.readLine();
String str="http://www.moving.com/real-estate/city-profile/results.asp?Zip="+s;
Document doc = Jsoup.connect(str).get();
for (Element table : doc.select("table.DataTbl"))
{
for (Element row : table.select("tr"))
{
Elements tds = row.select("td");
if (tds.size() > 1)
{
if (tds.get(0).text().contains("Per capita income"))
System.out.println(s+","+tds.get(2).text());
}
}
}
if(count%200==0)
{
Thread.sleep(300000);
System.out.println("Stoped for 5 minutes");
}
}