i have a large DB table which the user must be able to export as csv.
Backend -> Query DB -> Transform to CSV -> download file over HTTP
For small amount of data this is trivial. But the table will have millions of records in it, so i want to avoid loading all of it into memory, then transform all of it and then download it.
What i want ist a way to stream the data. So only touch ~1000 records at a time and directly flush them down the http pipe to the browser for download and then freeing the memory right after.
This is what i have tried:
@PostMapping("/csv/{tableName}")
public void getFilteredExceptionsForCsv(
@NotNull @PathVariable String tableName,
@NotNull @RequestBody EnterpriseGetRowsRequest req,
HttpServletResponse response) {
response.setContentType("text/csv");
String headerKey = "Content-Disposition";
String headerValue = String.format("attachment; filename=\"%s\"", "filename.csv");
response.setHeader(headerKey, headerValue);
// simple select query
String sql = gridRowsRepositoryCustom.getFilteredExceptionsForCsvQuery(req, tableName, getCurrentUser());
try {
Statement statement = null;
statement = db.getDataSource().getConnection().createStatement();
statement.setFetchSize(50);
ResultSet rs = statement.executeQuery(sql);
logger.debug("EXECUTED QUERY");
ResultSetMetaData rsmd = rs.getMetaData();
int columnsCount = rsmd.getColumnCount();
while (rs.next()) {
String[] values = new String[columnsCount];
for (int i = 1; i <= columnsCount; i++) {
String columnValue = rs.getString(i);
values[i-1] = columnValue;
}
String line = Stream.of(values)
.map(this::escapeSpecialCharacters)
.collect(Collectors.joining(","));
response.getWriter().println(line);
response.getWriter().flush();
}
logger.debug("CONTENT WRITTEN");
rs.close();
statement.close();
response.getWriter().close();
} catch (SQLException | IOException e ) {
logger.error("error executing sql statement {}", e.getMessage());
}
}
My expectation was that the JDBC driver will read only 50 records at a time and directly write them into the response stream, then get the next 50 records and so on. So the Download would start as soon as the first 50 entries were flushed.
Instead it looks like all records are fetched first, and then written to the output stream, then the request is completed and the browser begins downloading the file.
Here's the log for loading 1m records:
12:03:16.542 DEBUG 27992 --- [nio-8181-exec-8] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped to DataController#getFilteredExceptionsForCsv(String, RowsRequest, HttpServletResponse)
12:03:16.550 DEBUG 27992 --- [nio-8181-exec-8] d.t.z.d.repository.GridRowsRepository : EXPORT SQL: SELECT TARGET.*, ...[runcated]
12:04:20.342 DEBUG 27992 --- [nio-8181-exec-8] d.t.zebra.api.controller.DataController : EXECUTED QUERY
12:04:44.208 DEBUG 27992 --- [nio-8181-exec-8] d.t.zebra.api.controller.DataController : CONTENT WRITTEN
12:04:44.208 DEBUG 27992 --- [nio-8181-exec-8] o.s.web.servlet.DispatcherServlet : Completed 200 OK
As you can see executing the query takes about a minute, then transforming and writing to the output stream another 24 seconds - then the method finishes AND THEN the browser starts downloading the file.
I also tried the Spring version of it by using ResponseEntity<StreamingResponseBody>
but the result was exactly the same.
So what am i doing wrong? Is what i described above even possible?
Any help appreciated.