I have a Java Spark application that retrieves data from a Website as follows:
while(true)
{
try{
connection = (HttpURLConnection) uRL.openConnection();
/* optional default is GET */
connection.setRequestMethod("GET");
/* add request header */
connection.setRequestProperty("User-Agent", USER_AGENT);
connection.getResponseCode();
connection.setReadTimeout(0);
/* Read the response code */
bufferedReader = new BufferedReader(new InputStreamReader(connection.getInputStream(), StandardCharsets.UTF_8));
break;
}
catch(Exception e){
LOGGER.error("Error in querying Wikipedia: "+e.getMessage());
continue;
}
}
response = new StringBuffer();
while ((inputLine = bufferedReader.readLine()) != null) {
response.append(inputLine);
response.append("\n");
}
bufferedReader.close();
This code works well on Windows.
However, on a Centos machine which has an HTTP and HTTPs proxy server, it fails with Connection Timeout. I set the system Properties for the HTTPs Proxy for the application and make sure it works for some links. However, it doesn't work for some others.
For those it doesn't work, I also tried the same URL using wget on the linux server and worked.
Link that doesn't work:
https://ar.wikipedia.org/w/api.php?action=query&format=xml&titles=%D9%82%D8%B1%D9%89&redirects&prop=pageprops|categories&cllimit=500
link that works:
https://ar.wikipedia.org/w/api.php?action=query&format=xml&list=allpages&apnamespace=14&apfilterredir=nonredirects&aplimit=500