0

I can't figure out how to covert the following sequential code to a multi-threaded properly. I have done my best to avoid any shared resource and complete thread independence.

This is the single-threaded code package com.net;

import jdk.incubator.http.HttpClient;
import jdk.incubator.http.HttpRequest;
import jdk.incubator.http.HttpResponse;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

public class ReqSingle {

    private float totalElapsedTime = 0F;
    private HttpClient client = HttpClient.newHttpClient();

    public ReqSingle(String[] sties){
        for (String site : sties){
            long fetchStartTime = System.currentTimeMillis();
            String html = getResource(site);
            float elapsed = (float) (System.currentTimeMillis() - fetchStartTime) / 1000;

            Document doc = Jsoup.parse(html);
            System.out.println("It took " + elapsed + " seconds to fetch " + site + " with title " + doc.title());
            totalElapsedTime += elapsed;
        }

        System.out.println("Total Elapsed Time: " + totalElapsedTime + "\nTotal Number of sites: " + sties.length);
    }

    private String getResource(String someUrl) {
        String body = "";
        try {
            URI url = new URI(someUrl);
            HttpRequest request = HttpRequest.newBuilder()
                    .uri(url).GET().build();
            HttpResponse<String> response = client.send(request, HttpResponse.BodyHandler.asString());
            body = response.body();
        } catch (URISyntaxException e) {
            System.out.println("URL " + someUrl + "is not valid");
        } catch (IOException | InterruptedException e) {
            // do nothing
        }
        return body;
    }
}

This is my implementation of multi-threaded code:

package com.net;

import jdk.incubator.http.HttpClient;
import jdk.incubator.http.HttpRequest;
import jdk.incubator.http.HttpResponse;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

public class ReqThreaded {

    public ReqThreaded(String[] sites) {

        for (String rawUri : sites) {
            new Thread(new Site(rawUri)).start();
        }
    }

    class Site implements Runnable {
        URI uri;

        public Site(String rawUri) {
            try {
                uri = new URI(rawUri);
            } catch (URISyntaxException e) {
                System.out.println("URL " + rawUri + "is not valid");
            }
        }

        @Override
        public void run() {
            String html = "";

            long fetchStartTime = System.currentTimeMillis();
            try {
                HttpClient client = HttpClient.newHttpClient();
                HttpRequest request = HttpRequest.newBuilder()
                        .uri(uri).GET().build();
                HttpResponse<String> response = client.send(request, HttpResponse.BodyHandler.asString());
                html = response.body();
            } catch (IOException | InterruptedException e) {
                //do nothing
            }
            float elapsed = (float) (System.currentTimeMillis() - fetchStartTime) / 1000;
            Document doc = Jsoup.parse(html);
            System.out.println("It took " + elapsed + " seconds to fetch " + uri.toString() + " with title " + doc.title());
        }
    }
}

And this is the tester class and it's output:

package com.net;

public class ReqTester {
    public static void main(String[] args) {

        String[] topIranianSites = {
                "https://www.aparat.com/",
                "http://www.varzesh3.com/",
                "http://namnak.com/",
                "http://www.telewebion.com/",
                "https://divar.ir/",
                "https://www.ninisite.com/",
                "https://www.blogfa.com/",
                "http://www.namasha.com/",
                "http://www.yjc.ir/"
        };

        System.out.println("Single Threaded Test Results:\n");
        new ReqSingle(topIranianSites);
        System.out.println("\n\nMulti Threaded Test Results:\n");
        new ReqThreaded(topIranianSites);

    }
}

Output:

WARNING: Using incubator modules: jdk.incubator.httpclient
Single Threaded Test Results:

It took 2.843 seconds to fetch https://www.aparat.com/ with title آپارات - سرویس اشتراک ویدیو
It took 0.459 seconds to fetch http://www.varzesh3.com/ with title 
It took 0.52 seconds to fetch http://namnak.com/ with title نمناک
It took 2.231 seconds to fetch http://www.telewebion.com/ with title تلوبیون | مرجع پخش زنده و دانلود فیلم ، سریال و سایر برنامه های تلویزیون
It took 0.243 seconds to fetch https://divar.ir/ with title 
It took 1.749 seconds to fetch https://www.ninisite.com/ with title نی نی سایت | راهنمای بارداری و بچه داری
It took 0.407 seconds to fetch https://www.blogfa.com/ with title 
It took 1.796 seconds to fetch http://www.namasha.com/ with title نماشا - سرویس رایگان اشتراک ویدیو
It took 0.232 seconds to fetch http://www.yjc.ir/ with title 
Total Elapsed Time: 10.48
Total Number of sites: 9


Multi Threaded Test Results:

It took 0.114 seconds to fetch https://divar.ir/ with title 
It took 0.236 seconds to fetch http://www.yjc.ir/ with title 
It took 1.519 seconds to fetch http://www.varzesh3.com/ with title 
It took 1.587 seconds to fetch https://www.blogfa.com/ with title 
It took 3.524 seconds to fetch http://namnak.com/ with title نمناک
It took 4.152 seconds to fetch https://www.ninisite.com/ with title نی نی سایت | راهنمای بارداری و بچه داری
It took 4.486 seconds to fetch http://www.namasha.com/ with title نماشا - سرویس رایگان اشتراک ویدیو
It took 4.725 seconds to fetch https://www.aparat.com/ with title آپارات - سرویس اشتراک ویدیو
It took 5.82 seconds to fetch http://www.telewebion.com/ with title تلوبیون | مرجع پخش زنده و دانلود فیلم ، سریال و سایر برنامه های تلویزیون
m.d
  • 169
  • 2
  • 9
  • Why not just use an `ExecutorService`? It would make the multithreading part a lot simpler. – Kayaman Mar 08 '18 at 17:59
  • 3
    Also, what is wrong with the multi-threaded code that you have? I don't see any errors. – markspace Mar 08 '18 at 18:00
  • 3
    What are you expecting to happen that isn't? – slambeth Mar 08 '18 at 18:01
  • @Kayman why not just figure out what's the problem with this code. Here I'm in the learning phase. – m.d Mar 08 '18 at 18:03
  • I do not see any problem with that code. Output is the same, except that it is sorted by execution time. Could you describe your problem? – Dorian Gray Mar 08 '18 at 18:04
  • @markspace this is obvious the mutlthreaded code as i have done it before in python should end a slightly longer than the longest fetch time here for instance namnak fetching took more than 3 seconds which is more than the single threaded one. This sounds wrong to me. Isn't it – m.d Mar 08 '18 at 18:06
  • @TobiasWeimer please read my answer to markspace – m.d Mar 08 '18 at 18:07
  • Well, multi-threading doesn't mean theat your code runs faster. I fact, you have more overhead because of the thread management – Dorian Gray Mar 08 '18 at 18:09
  • @TobiasWeimer OK the overhead can add some millis but the result of each thread should be close to single threaded code. if blogfa took 0.407 to go on the trip in the single threaded one it shouldnt jump to 1.587 in the multi-threaded variant. the numbers just don't seem right – m.d Mar 08 '18 at 18:12
  • That is the time taken by an individual request, which still runs in a single thread. Multithreading shouldn't have any effect on it. Maybe the site/network slowed down in that instance. – Vasan Mar 08 '18 at 18:15
  • Another issue here is that you have to 'warm up' JVM before any benchmark (let the JIT compiler do the necessary optimizations). Search SO - there are some good answers for that question. – zlakad Mar 08 '18 at 18:17
  • @Vasan no matter what urls i choose the multithreaded code trip time jumps. I think they should be close unless java uses some weird kind of multithreading – m.d Mar 08 '18 at 18:18
  • No. Read all the other answers please. – Dorian Gray Mar 08 '18 at 18:22
  • And what is the question? – Antoniossss Mar 08 '18 at 18:24
  • How could the error be obvious? You've trimmed off the total elapsed time for the multi-threaded case. – markspace Mar 08 '18 at 18:25
  • @TobiasWeimer still nobody told me why the trip time jumps. It is weird – m.d Mar 08 '18 at 18:25
  • @markspace i can include that but the real problem here is the trip time jumps the total elapsed time is slightly bigger than longest trip time – m.d Mar 08 '18 at 18:28
  • @m.d Well, Vasan told you a possible solution: "Maybe the site/network slowed down in that instance.". How do you expect a precise answer? Or how many people do you need to tell you that your code is correct? – Dorian Gray Mar 08 '18 at 18:28
  • Besides the end site, it could also be your ISP throttling you back because you're making too many requests too quickly. It could also be your own OS having a resource problem for the same reason. And some versions of Python don't actually implement multi-threading, so there could be a clue there too. – markspace Mar 08 '18 at 18:31
  • @TobiasWeimer i told Vasan that is the case no matter the urls i choose. I'm curious because numbers don't match the way a correct multithreaded code should work. – m.d Mar 08 '18 at 18:31
  • 1
    @markspace thanks for your time. So the code is correct i will experiment more with the code and other answers to get better insight into the problem – m.d Mar 08 '18 at 18:34
  • @m.d You need to run this a lot of times to arrive at a conclusion. Also, try measuring the time from non-Java requests to the same sites (maybe curl for eg). I think you'll see some requests are just slower. – Vasan Mar 08 '18 at 18:34
  • @Vasan I plan to experiment more and write the same code in python and also using curl. Then i will compare the results. – m.d Mar 08 '18 at 18:38

2 Answers2

1

Your code is correct. But it is recommended to control the number of threads by using some executor service implementation.

e.g:

ExecutorService executor = Executors.newFixedThreadPool(poolSize);
executor.submit(new Site(rawUri));

Even with large number of rawuris. It will keep a tap on the number of threads.

teeman12
  • 201
  • 1
  • 4
  • Although the code is already fine but many users have recommended the use of ExecutorService so i think it can be the answer – m.d Mar 09 '18 at 10:57
0

In your example, there's no reason to create threads yourself; you can simply utilize HttpClient#sendAsync. From its documentation:

Sends the given request asynchronously using this client and the given response handler.

Also, you should be using JMH to properly benchmark Java programs. See: How do I write a correct micro-benchmark in Java?

Jacob G.
  • 28,856
  • 5
  • 62
  • 116