Currently I am pulling a lot of data from a service using their built in REST API. Currently it takes about ~600ms for the service to return a JSON formatted file, and I need to get 495 JSON formatted files returned.
As my original POC I just called them linearly in the main thread (didn't want the program to advance till all the queries were complete), and that took about ~300 seconds to complete. Now that I have shown POC, I need to optimize this quite a bit, because a 5min query is not very ideal. Currently what I am doing is using Executor Service with a fixed Thread Pool and adding 495 tasks to the service and invokeAll() them.
My only issue, is that now I am getting bad data values. Logically nothing should change, the querys only return 50 elements at a time so all I am doing is changing the starting point (which I've checked and there are no overlaps in the URL). For some reason I have results that are missing and I have duplicates of existing results. The code to process the JSON has not changed, the only thing that has changed was the method in which the results were obtained.
I initially thought I had an issue with the variable traversing Threads and it not being Atomic, but all that is really happening after I get the JSON is that I am parsing it, creating a Requirement object, then adding it to a Set. Since the Set is never redefined only added I was under the impression it being Atomic wouldn't make a difference (I could be 100% wrong however).
Below the first snippet of code is how I am running ti on the main thread, linearly, whereas the second snippet is my version which includes multithreading. I do know it is a bit messy, this is currently my POC to determine how much faster the multi-threading is (currently goes from ~300seconds to ~45sec) and if its worth to apply to other calls within my program.
I just need to figure out why values are being duplicated and missing (no duplicates or missing values when it is called linearly) when using multiple threads. The URL determines the starting point, and the size never changes or anything, I can't figure out why I am ~ 2000 requirements short and 224 duplicate entries where there shouldn't be ANY at all.
The only thing that was changed has been the Executor Service and the loop in which I obtain the startingPoint (aka I just calculate how many loops I need instead of relying on the returned current position). All the creatRequirement(obj) function does is parse the JSON file even more, and create a Requirement Object using data passed from JSON into the constructor.
private void obtainAllRequirements() {
int startingLocation = 0;
boolean continueQueries = true;
String output = null;
do {
output = executeRESTCall(baseUrl + "/abstractitems?maxResults=50&itemType=43&startAt=" + startingLocation);
JSONObject obj = new JSONObject(output);
if ((obj.getJSONObject("meta").getJSONObject("pageInfo").getInt("totalResults") - startingLocation) <= 50) {
continueQueries = false;
}
createRequirements(obj);
startingLocation += 50;
} while (continueQueries);
}
private void obtainAllRequirements() {
String output = executeRESTCall(baseUrl + "/abstractitems?itemType=43&startAt=0");
int totalResults = new JSONObject(output).getJSONObject("meta").getJSONObject("pageInfo").getInt("totalResults");
ExecutorService service = Executors.newFixedThreadPool(THREADS);
List<Callable<Void>> tasks = new ArrayList<>();
for (int i = 0; i < Math.ceil(totalResults/MAX_RESULTS); i++){
final int iteration = i;
tasks.add(new Callable<Void>() {
@Override
public Void call() throws Exception {
System.out.println(baseUrl + "/abstractitems?maxResults="+MAX_RESULTS+"&itemType=43&startAt=" + (iteration*MAX_RESULTS));
String o = executeRESTCall(baseUrl + "/abstractitems?maxResults="+MAX_RESULTS+"&itemType=43&startAt=" + (iteration*MAX_RESULTS));
JSONObject obj = new JSONObject(o);
createRequirements(obj);
return null;
}
});
}
try {
service.invokeAll(tasks);
service.shutdown();
}catch (InterruptedException e){
e.printStackTrace();
}
}
Edit: Here is what happens inside create requirement, the constructor for Requirement just takes the JSON data and assigns the values to specific private variable members.
private void createRequirements(JSONObject json) {
JSONArray dataArray = json.getJSONArray("data"); // Gets the data array in the JSON file
for (int i = 0; i < dataArray.length(); i++) {
JSONObject req = dataArray.getJSONObject(i);
Requirement requirement = new Requirement(req);
if (!requirement.INVALID_PROJECT) {
requirements.add(requirement);
}
}
}
EDIT: Added the requirement's set to be a ConcurrentSet but no change.
this.requirements = new ConcurrentHashMap<>().newKeySet();
EDIT: Added excute REST call
public String executeRESTCall(String urlValue) {
try {
URL url = new URL(urlValue);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "application/json");
String encoding = Base64.getEncoder()
.encodeToString((Credentials.XXX + ":" + Credentials.XXX).getBytes("UTF-8"));
conn.setRequestProperty("Authorization", "Basic " + encoding);
if (conn.getResponseCode() != 200) {
throw new RuntimeException("Failed : HTTP error code : " + conn.getResponseCode());
}
BufferedReader br = new BufferedReader(new InputStreamReader((conn.getInputStream())));
return br.readLine();
} catch (Exception e) {
e.printStackTrace();
}
return "";
}