I have a REST web service built with J2EE running on top of Tomcat that has 3 interfaces. Two of those interfaces respond quickly (with milliseconds), but one of those interfaces is blocking in between 1-2 seconds before it can send out the HTTP response. It is that way because of the nature what this web service is doing and there's nothing I can do to block for less time.
Application is running in Amazon in RHEL7 OS and has Apache Web Server 2.4.6 in front of it acting as a reverse proxy which just handles handles LDAP authentication.
The requirement is that every interface must separately handle 1000 requests per second without significant drop in response time compared to no load and 99.5% of the requests must succeed. On the blocking interface to have 1000 requests per second, it means therefore a bit over 1500 concurrent users.
I composed a performance test and the application itself without Apache can easily serve even higher number of concurrent users without significant drop in response time. However if I let the test's pass the Apache proxy, then response time's drop dramatically on the blocking interface. Even with 500 concurrent users 10% of the requests on the blocking interface respond longer than 4 seconds. Even worse if I run the test for hours, apache will take so much memory that it makes other applications in the same operating system crash (without Apache running I have over 2 gigs of free memory). I've played with instructions from this SO: How do you increase the max number of concurrent connections in Apache?
Things went a bit better, but not too much.
So my question is, has anyone got experience configuring Apache as a proxy that would handle over thousand 1-2 sec latency concurrent requests and at the same time handle well low-latency requests (the other two interfaces) and if there is someone, then how did you configure your Apache to achieve that?