0

I have Java app which reads JSON file which contains SQL queries and fires them on database using JDBC.

Now I have 50 thousand such files and I need to spawn 50 thousand independent threads to read each files and upload them into database. I need to spawn these threads on a specific time after specific seconds. For e.g. I have the following Map of sorted login details when I should spawn these threads. Login details are in seconds many threads to be spawned at 0 seconds, 10 seconds, 50 seconds etc

Map<String,Integer> loginMap = new HashMap<>(50000);

I am using ScheduleExecutureService to schedule these threads I have something like the following

ScheduleExecutureService ses = Executors.newScheduledThreadPool(50000);
for(Map.Entry<String,Integer> entry : loginMap.entrySet()) {
     Integer loginTime = (Integer) entry.getValue();
      ses.schedule(new MyWorker(entry.getKey()),loginTime,TimeUnit.SECONDS);
}

Above code works for small files in few thousands but it does not scale for 50 thousands and also since my worker uses JDBC connections database is running out of connections.

Even though I acquire connection in the run method of thread. Does these threads starts executing run even if it is not suppose to run? I am new to multi-threading.

halfer
  • 19,824
  • 17
  • 99
  • 186
Umesh K
  • 13,436
  • 25
  • 87
  • 129

1 Answers1

4

You don't want 50,000 threads! Each thread consumes some resources, particularly an area of RAM for stack space, this could be about 1MB. Do you have 50GB of RAM?

There is also no benefit for running many more threads than you have cores.

This doesn't mean you can't queue 50,000 tasks and a sensible number of worker threads related to the hardware.

ScheduleExecutureService ses = Executors.newScheduledThreadPool(8); //sensible, though could be derived from acutal hardware capabilities.
weston
  • 54,145
  • 21
  • 145
  • 203
  • Hi Wetson thanks for the reply I dont want to create these many threads but things I dont know how to schedule based on login seconds for e.g. for 10th seconds 100 threads should be spawned and at the 20th seconds 40 threads should be spawn and so on. – Umesh K Mar 05 '15 at 20:11
  • So are you stress testing logging in? Well looks fine, you just can't have so many threads. You can't get 1 PC to act like thousands. There's only so much it can do. Consider alternatives like getting more machines to log in at the same time. Or better, research how to properly stress test this area. – weston Mar 05 '15 at 20:16
  • It is not stress testing above code works fine because at a time all the threads dont run only collection of threads runs. I am just thinking of finding all login threads whose login time is same and fire them. Is it good idea? – Umesh K Mar 05 '15 at 20:19
  • Yeah, just understand that if we limit to 8 threads and 10 come along for the same log in time, some will be delayed as the worker threads will all be busy. – weston Mar 05 '15 at 20:21
  • Your operations will be delayed anyway. There will be queue in network stack or in the database. Ultimate concurrency level is count of processor cores. Anything above that results in queueing somewhere. You should use connection pooling with e.g. 10 connections and that should be enough for you. Correspondingly use thread pool with 10 threads so they'll utilize all those connections. More threads are waste of RAM. – vbezhenar Mar 05 '15 at 20:29
  • It doesn't matter how many threads you have. Even if you create #cores threads, your tasks are IO bound. So you won't speed up anything. – fps Mar 05 '15 at 21:09
  • @Magnamag You'd be surprised. Try it, maybe opening 100 files into RAM, synchronously and then asynchronously with various numbers of worker threads. – weston Mar 06 '15 at 09:07
  • @weston I mean for the scale the OP is managing. When you work at that scale, you need to use asynchronous I/O and networking, queues, maybe an event-oriented framework. – fps Mar 06 '15 at 09:33
  • @Magnamag Right! so I'm saying try 100 files, you'll see a difference, OP has 50K files, and you're saying at that point you won't see a difference. – weston Mar 06 '15 at 09:34
  • @weston That's my point. As tasks are I/O bound, most of them would be waiting. – fps Mar 06 '15 at 09:40
  • @Magnamag Exactly, while some are waiting others can make use of the CPU. It's the reverse of what you think, because there's a lot of waiting for I/O, your CPU is idle. I've actually answered a similar question on this: http://stackoverflow.com/questions/28087393/reading-from-disk-and-processing-in-parallel/28092283#28092283 – weston Mar 06 '15 at 09:46
  • @weston And what would be the CPU being used for, if all threads are I/O bound? Any way, I'm not saying OP should not use threads. I'm just saying a few threads + asynchronous I/O + event-driven framework. What OP should not do is create 50k threads, that's my point here. – fps Mar 06 '15 at 12:12
  • "It doesn't matter how many threads you have. Even if you create #cores threads, your tasks are IO bound. So you won't speed up anything." Sure sounds like you're saying use of more than one thread is pointless. – weston Mar 06 '15 at 12:18
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/72406/discussion-between-magnamag-and-weston). – fps Mar 06 '15 at 12:20