1

Given:

  • a web application running on websphere and a oracle database with large tables
  • a jsf website that allows the user to download parts of the data in csv files.
  • Normal way of accessing the DB is via JPA (openJPA)
  • potential file download size is up to 500MB to 1GB

Problem:

How to enable the user to download such large data exports from the DB via the web application on his client hard drive without OutOfMemory Exceptions/storing/buffering the complete data on the server

KFleischer
  • 942
  • 2
  • 11
  • 33
  • An idea I have is streaming the data. But how to stream the data from the DB into the client browser? – KFleischer Nov 28 '18 at 16:12
  • 2
    JPA has limitations, and I think you've hit one of them, unless there is some capability to stream. This isn't an answer to your question, but using JDBC in a Servlet and doing a read/write from the ResultSet to the output stream is an efficient approach. – Steve11235 Nov 28 '18 at 16:17
  • Or streaming write to a temporary file on the server and stream that to the client (via a servlet or http://showcase.omnifaces.org/servlets/FileServlet) if the db does not allow direct streaming – Kukeltje Nov 28 '18 at 16:31
  • Ok, JPA is not a "must have". Is [this](https://stackoverflow.com/questions/685271/using-servletoutputstream-to-write-very-large-files-in-a-java-servlet-without-me) what you would reccomend for streaming the data to the client? How would you read data via stream from the DB? – KFleischer Nov 28 '18 at 16:32

1 Answers1

3

You're likely going to run into a number of limits trying to do this in a single HTTP request.

Streaming and asynchronous processing are two possible solutions.

Streaming

It looks like JPA 2.2 has added streaming support. To prevent out of memory conditions you'll likely need to tune your JVM, you'll also likely need to adjust the JDBC fetch size to balance your DB performance and your client performance.

You can then stream/buffer the results back to the client.

There are problems with this approach though. What happens when there's a momentary loss in the network connection? Someone accidentally closes their browser, providing Content-Length response headers to help gauge time remaining, etc.? A better approach is:

Asynchronous

The steps could go something like this:

  1. Client submits a request to the server.
  2. Server kicks off an asynchronous process (web worker or JMS).
  3. The async process generates the file and stores it temporarily on the file system.
  4. When the process completes it sends an email to the client to pick up their file. This is known as a claim check. Instead of an email, it could also be a URL that's returned that the user can then refresh to see when their file is ready for pick up.
  5. After a "reasonable" amount of time the file is deleted.

This approach can be configured to allow auto-resume capabilities and eliminates network uncertainty and browser tab closes from the equation. Overall, it also alleviate the strain of having to handle re-requests. Using JMS also, gives you the ability to scale this solution horizontally instead of vertically.

Community
  • 1
  • 1
Jason Armstrong
  • 1,058
  • 9
  • 17