0

Problem

In our webapp (Angular + JAX-RS REST backend running on WebLogic + IIS proxy) we have 1 REST endpoint which returns an XLSX download (octet-stream). These XLSX files can be huge (up to the XLSX limit of 1M rows).

After some time, on slow connections, the download fails (ERR_CONNECTION_RESET in Chrome devtools). The exact time when this happens varies: Some days after 4-6 minutes, other days after 10-12 minutes. No clear pattern.

Fast(er) downloads work fine, are always succesful. I have seen downloads of hundreds of MBs finish succesfully in (eg.) 8 minutes, but others fail at (eg.) 11 minutes.

The problem is that I do not understand why the download fails and why the connection is reset. Any pointers or tips on how to test and debug this problem are welcome. As far as I understand ERR_CONNECTION_RESET just means that something reset the connection. Just looking at the response headers gives no indication on who reset it.

Question

How can I understand why the download fails and who resets the connection? The logfiles do not state which component resets the connection.

Setup

The webapp is deployed on WebLogic 12.2 on the internal network. IIS 8.5 acts as a reverse proxy making the webapp accessible on the internet.

Details

  • When I download without IIS as reverse proxy (from our internal network), but with a speed limit in Chrome devtools), the download is always succesful. I've had downloads with 50kb/s which finished fine in 2 hours. We cannot find any settings in IIS which influence this behaviour, so I am hesitant to definitively conclude that IIS causes the connection reset, since the precise time varies.

  • The WebLogic (exception) logs state that writing to the OutputStream fails because of a closed connection. No exceptions or log entries which indicate that WebLogic closed the connection.

  • Using other download speeds makes no difference. There is no direct relation to speed and connection reset time.

  • The download is never stalled.

  • VPN connection does not seem a factor, people with and without VPN experience the same problem.

  • Changing proxy is unfortunately not an immediate solution. Large corporate. Without understanding and knowing precisely that (if) IIS is the problem - not going to happen.

WebLogic exception

Caused by: java.net.SocketException: Socket closed
    at weblogic.socket.NIOOutputStream.convertToSocketException(NIOOutputStream.java:250) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
    at weblogic.socket.NIOOutputStream.access$600(NIOOutputStream.java:33) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
    at weblogic.socket.NIOOutputStream$BlockingWriter.flush(NIOOutputStream.java:482) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
    at weblogic.socket.NIOOutputStream$BlockingWriter.write(NIOOutputStream.java:334) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
    at weblogic.socket.NIOOutputStream.write(NIOOutputStream.java:220) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
    at weblogic.socket.JSSEFilterImpl.writeToNetwork(JSSEFilterImpl.java:829) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
    at weblogic.socket.JSSEFilterImpl.wrapAndWrite(JSSEFilterImpl.java:789) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
    at weblogic.socket.JSSEFilterImpl.write(JSSEFilterImpl.java:503) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
    at weblogic.socket.JSSESocket$JSSEOutputStream.write(JSSESocket.java:154) ~[com.oracle.weblogic.server.muxers.jar:12.2.1.4]
    at weblogic.servlet.internal.ChunkOutput.writeChunkTransfer(ChunkOutput.java:628) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
    at weblogic.servlet.internal.ChunkOutput.writeChunks(ChunkOutput.java:590) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
    at weblogic.servlet.internal.ChunkOutput.flush(ChunkOutput.java:474) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
    at weblogic.servlet.internal.ChunkOutput$3.checkForFlush(ChunkOutput.java:760) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
    at weblogic.servlet.internal.ChunkOutput.write(ChunkOutput.java:373) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
    at weblogic.servlet.internal.ChunkOutputWrapper.write(ChunkOutputWrapper.java:165) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
    at weblogic.servlet.internal.ServletOutputStreamImpl.write(ServletOutputStreamImpl.java:186) ~[com.oracle.weblogic.servlet.jar:12.2.1.4]
    at org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(ResponseWriter.java:325) ~[org.glassfish.jersey.containers.jersey-container-servlet-core.jar:?]
    at org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:229) ~[org.glassfish.jersey.core.jersey-common.jar:?]
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:299) ~[org.glassfish.jersey.core.jersey-common.jar:?]
    at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253) ~[?:1.8.0_261]
    at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211) ~[?:1.8.0_261]
    at java.util.zip.ZipOutputStream.write(ZipOutputStream.java:331) ~[?:1.8.0_261]
    at org.apache.poi.util.IOUtils.copy(IOUtils.java:317) ~[org.apache.poi-poi-3.17.jar:3.17]
    at org.apache.poi.xssf.streaming.SXSSFWorkbook.copyStreamAndInjectWorksheet(SXSSFWorkbook.java:501) ~[org.apache.poi-poi-ooxml-3.17.jar:3.17]
    at org.apache.poi.xssf.streaming.SXSSFWorkbook.injectData(SXSSFWorkbook.java:391) ~[org.apache.poi-poi-ooxml-3.17.jar:3.17]
    at org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:936) ~[org.apache.poi-poi-ooxml-3.17.jar:3.17]
...

IIS logs The only line I could find relevant to this problem is:

1.2.3.4, -, 9/15/2020, 9:20:14, W3SVC3, HSTWEB, 2.3.4.5, 561236, 1813, 9658662, 500, 0, GET, /api/resources/export/FOO, sorting=1,

Bossk
  • 707
  • 8
  • 24
  • Coul you bypass IIS just to see whether it works or not? Could you paste the exception on Weblogic you have described? – rcastellcastell Sep 15 '20 at 09:21
  • Yes: Like I said in my details: Bypassing IIS works fine. So I would like to determine with absolute certainty if IIS is resetting the connection. – Bossk Sep 15 '20 at 09:33
  • What about any log message on IIS? – rcastellcastell Sep 15 '20 at 11:29
  • You have said there the pattern is not clear because it varies between 8, 11 and more minutes. However, I think it could be more about how long the connection stays inactive, perhaps that is the issue. I am not an expert in IIS so, could you check this parameter connectionTimeout? [IIS parameters](https://learn.microsoft.com/en-us/iis/configuration/system.applicationhost/sites/sitedefaults/limits) – rcastellcastell Sep 15 '20 at 14:06
  • You can only use tools like Wireshark to check who resets TCP connections. – Lex Li Sep 15 '20 at 15:22
  • Interesting finding: I updated the IIS connectionTimeout from 2 minutes to 30 minutes. Instead of after ~11 minutes the download now failed after ~40 minutes. Chrome showed that the connection was stalled at around the 12 minutes mark. So, changing connectionTimeout did not fix the problem, but may give a clue on what is happening: somehow with IIS as proxy the connection is stalling after 10 minutes. Any further ideas? – Bossk Sep 15 '20 at 17:30
  • Please check the error message in the network. you can see which component is faulty in the network. – samwu Sep 16 '20 at 03:16
  • @samwu What do you mean with "error message in the network"? Do you mean checking with Wireshark? I will be setting that up later, but do you mean something else? – Bossk Sep 16 '20 at 05:28
  • I think this could be more about the design of your application; would not it be better to send an asynchronous call to generate a compressed file on a SFTP server and then getting the file when this is ready? While thinking on async I found [this](https://stackoverflow.com/questions/52943766/how-can-i-more-efficently-download-large-files-over-http) – rcastellcastell Sep 16 '20 at 07:39
  • I have no indications that the current implementation is inefficient or needs rewriting at this point, since everything works perfectly fine without a reverse proxy. Data is streamed to the browser using an OutputStream and StreamingOutput which is for XLSX files efficient and uses relatively low memory. (The XLSX file is not a static file, but generated from content in a database). I requested the IIS HTTPERR logs, which I did not know existed. Hopefully they give an indication what is wrong. – Bossk Sep 16 '20 at 08:09
  • @Bossk I mean in your browser, use F12 to check for error messages in the network. – samwu Sep 17 '20 at 09:46

0 Answers0