Job or Task Scheduling

Question

I have a multitenant system (Java EE) running on Glassfish 4, which receives roughly 500 requests for a printing job at a time at a specific period in the year. Before this period, resources are enough to handle these requests, but at this period, the requests simply become too much for the server to handle, which leads to a lot of downtimes. My idea to solve the problem is to simply bring some order to how these requests are handled. That is, a sort of first-come-first-serve thing where one request is handled, and then the next is attended to until there are no more requests. I tried to build it as a sort of service which continuously checks to see if there are any requests, and then services the requests in order.

As per the solutions I have found from StackOverflow and also lots of searching online, I have narrowed it down to a few. But I have some concerns:

Scheduling: Most implementations of job scheduling I have seen each requires some recurring interval for a task to be performed. This won't work for my system because the time for each printing job to be completed is dependent on how many pages of reports would be generated. It could be 5 pages and it could be 50. In other words, I don't know how long a request would take to be serviced.

Java Messaging Service (JMS): I thought about using JMS queues, but I just didn't understand how I could relate it to my current situation. I understand it's meant for messaging and could maybe solve a part of my problem, but I'm yet to see how.

Endless looping: This seems very tacky and quite frankly, a hack I would rather not even try on a Java EE application and on a system that's lacking resources.

I would appreciate suggestions as to how I would, in summary, implement a system that would endlessly receive requests, service them irrespective of how long they would take, and move on to the next request. If there aren't any requests, it waits. If there are too many requests, it simply services them in the order in which they are received.

First Edit: So after giving it some thought, considering multitenancy, and overall complexity of my current system, I decided to create another system which would receive client requests to generate these results. This proposed system would not itself generate reports, but would simply ask the current system to generate the report which would subsequently be emailed to the client. The queuing of requests (I think) can be achieved in the proposed system. Now I just need to figure out how as this system would also be a Java EE application. Maybe this is the point some of these answers come into play. Your thoughts would be deeply appreciated.

This isn't entirely clear. Can you not just put the tasks into a queue as they arrive, and then service the queue with some number of worker threads? — Oliver Charlesworth, Mar 25 '18 at 17:37
@ThorbjørnRavnAndersen the bottleneck is that all 500 of these requests are serviced at the same time. I guess the JVM runs out of memory at some point and the server goes down — jaletechs, Mar 25 '18 at 17:39
Sounds like you could benefit immensely by recreating this in the lab and find out what actually happens before you start fixing it. — Thorbjørn Ravn Andersen, Mar 25 '18 at 17:43
@OliverCharlesworth this sounds good. But how would I start the processing? It's a web application, and I would need this service to run for all time if it could servicing said queue — jaletechs, Mar 25 '18 at 17:44
@ThorbjørnRavnAndersen I've observed this in production. JasperReports are being generated, and one report could take up to a minute to be generated. This includes initializing jasper objects and then rendering the reports. Now 500 requests at a time to do this would take up a lot of system resources, hence the idea to order them. One client could get emailed his report now, and another could get emailed his in ten minutes time (depending on the order) — jaletechs, Mar 25 '18 at 17:48
I would have used Kafka in this case, please check:https://kafka.apache.org/intro whether it i suitable for your case — dkb, Mar 25 '18 at 19:58
Can you explain in more details what this second system will do? It looks like it should be a kind of "smart queue" that holds the requests, controls their order and dispatches them to your current system in this order. If it is correct, won't it be an overkill to create a new system just for this task? — contrapost, Mar 31 '18 at 10:11
@contrapost As it is, trying to generate these reports cause the entire system to lag, whether or not you're generating reports. Again, multitenancy is the major complexity I have on the current system, and every solution becomes complicated by it. I'm just trying to get some of the heavy tasks to a different system, so the experience can be smooth on the existing system. That means the new system comes into play only when there's heavy reporting to be done. But you do mention "smart queue" any idea how I could implement that in as simple a manner as possible? — jaletechs, Apr 01 '18 at 11:19

score 0 · Answer 1 · answered Mar 25 '18 at 18:05

0

You can try to implement point-to-point model of messaging with one sender and several receivers. In this case the queue that will hold all messages from sender will control consumption of the messages in such way that each message will be received only by one receiver.

NB! The queue in this case won't guarantee the order of receiving, but as I understand it's not a case for you.

So bottom line, try to implement infrastructure based on Java Messaging API.

By the way, if it's not critical you can set an expiration period for the messages and increase stability of your application.

answered Mar 25 '18 at 18:05

contrapost

673
12
22

Order does matter in how the requests would be handled (first-come-first-serve). It is interesting to read up on MDB's though and how they help with JMS. it looks like a plausible solution. – jaletechs Mar 25 '18 at 21:31
The spec doesn't guarantee the order but it is up to concrete implementation (there are some that give possibility to set the order, for example http://activemq.apache.org/how-do-i-preserve-order-of-messages.html). According to observations from several programmers the default order is FIFO if you have one sender. – contrapost Mar 25 '18 at 23:50
On the other hand multithreading won't guarantee any order at all as far as I understand. – contrapost Mar 25 '18 at 23:51
Apache ActiveMQ. That has come up a lot in my research. I'll look into it – jaletechs Mar 26 '18 at 08:56
Great if you can update your question with the solution you choose to go with. In my opinion your question covers an important issue in EE development. – contrapost Mar 26 '18 at 10:26
I haven't decided on which solution to go with yet. There are factors to consider in my case like multitenancy, autoscaling on AWS servers etc. These little things make it hard to decide which way to go. But I see the final solution as some sort of hybrid of the ones mentioned here. I'll definitely update the question as soon as I have a direction. – jaletechs Mar 26 '18 at 11:52

score 0 · Accepted Answer · answered Apr 24 '18 at 08:31

After a month of some serious work, I arrived at a solution which works and has been implemented already.

Based on my first edit, I did create a new system. But not in the way that I initially thought. The new system is capable of generating the reports I need. It's only duty is to receive requests from the existing system, persist them on a database, and generate the reports for those requests when appropriate.

Avoiding JMS: I needed 2 things from JMS at the time of asking the question, namely:

(1) a queue of some sort to order requests, and

(2) making client calls asynchronous.

But without doing much, I was able to achieve asynchronous calls with my servlet. Apparently, there is a property (asyncSupported) on the @WebServlet annotation to make the servlet calls asynchronous.

@WebServlet(name = "ReportGenServlet", urlPatterns = {"/report-gen-servlet-path"}, asyncSupported = true)

Next, with a reliable persistent store for these 500 requests, which may or may not be serviced immediately, I proceeded to implement a queue processor to service the requests on a first-come-first-serve basis.

import javax.ejb.*;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * Created by jaletechs on 4/23/18.
 */

/*
    A Report Request Queue Processor
    Checks for new and failed report requests every X minutes,
     and processes the requests accordingly
 */
@Singleton
@LocalBean
@Startup
public class ReportRequestQueueProcessor {

    @EJB
    private SomeJobEjb ejb;
    @EJB
    private ReportRequestEntityMngrLocal requestEntityMngr;

    private AtomicBoolean busy = new AtomicBoolean(false);

    @Lock
    @Schedule(second = "*/30", minute = "*", hour = "*", persistent = false)
    public void atSchedule() throws InterruptedException{
        processRequests();
    }

    @Lock(LockType.READ)
    public void processRequests() throws InterruptedException{

        if (!busy.compareAndSet(false, true)){
            return;
        }

        try {
            List<ReportRequest> requestList = requestEntityMngr.getPending(50);

            for(ReportRequest request: requestList) {
               ejb.generateAndEmailReport(request);
            }
        } finally {
            busy.set(false);
        }
    }    
}

I must say I liked this method a lot. The @Lock annotations, together with the AtomicBoolean test in my logic allowed currently running processing tasks to complete without interruptions, even if there's an attempt by the container to refresh. This is perfect for processing tasks that have no fixed time of completion. Also, with a few JPA queries, I am able to fetch requests that are old and not yet attended to, along with those that failed for some reason or the other. This orderliness has greatly reduced the number of times the server goes down. Also, I discovered clients are willing to wait a bit, as long as they are guaranteed results and in a reasonable time.

I should also add that I optimized my report generation code. In fact, I must admit that this should have been the first step. By the time I was done optimizing, reports that took 92 seconds to generate, were generating in 13 seconds. That's a lot of improvement.

So, this is how I solved the problem. Questions, comments, suggestions would be appreciated. Thanks for all the help.

Job or Task Scheduling

2 Answers2