0

Webapp, in my project to provide download CSV file functionality based on a search by end user, is doing the following:

A file is opened "download.csv" (not using File.createTempFile(String prefix, String suffix, File directory); but always just "download.csv"), writing rows of data from a Sql recordset to it and then using FileUtils to copy that file's content to the servlet's OutputStream.

The recordset is based on a search criteria, like 1st Jan to 30th March.

Can this lead to a potential case where the file has contents of 2 users who make different date ranges/ other filters and submit at the same time so JVM processes the requests concurrently ?

Right now we are in dev and there is very little data.

I know we can write automated tests to test this, but wanted to know the theory.

I suggested to use the OutputStream of the Http Response (pass that to the service layer as a vanilla OutputSteam and directly write to that or wrap in a Buffered Writer and then write to it).

Only down side is that the data will be written slower than the File copy. As if there is more data in the recordset it will take time to iterate thru it. But the total time of request should be less? (as the time to write to output stream of file will be same + time to copy from file to servlet output stream).

Anyone done testing around this and have test cases or solutions to share?

tgkprog
  • 4,493
  • 4
  • 41
  • 70
  • Why do you say that write the file directly into the OutputStream is slower than create the file in FileSystem to later send it through the wire?? – Carlitos Way Apr 19 '18 at 07:07
  • @carlitos-way oh i meant vs copying from file to Servelet output stream. meaning if recorset has many records, iterating that will take time (i dont think it makes a differencec though, and writing to Servlet output stream will take less time over all for complete request) – tgkprog Apr 19 '18 at 09:23
  • 2
    I wouldnt put the data from the request in a file if it is not needed. Better yet, if you can, you should iterate through the data set and write to the output stream immediately. This will make the client immediately recieve his response and start downloading the data. But if you really need the file, then I would make a file with a (partial) random generated name (like a UUID) so each user can have his own file and you wont have issues any more. But FIRST, make a unit test that would show you that your current situation fails, then fix it and show that it doesnt fail any more. – Wietlol Apr 20 '18 at 21:04
  • @wietlol okay i know instead if using UUID the safer bet is to use File.createTempFile(String prefix, String suffix, File directory); that does what u say. But i wanted to know advantages and any test results of the other way - direct to Servlet output stream – tgkprog Apr 21 '18 at 22:00
  • 1
    The direct linking to the output stream advantages are mostly that you wont require any hard drive storage during the transfer, your code will most probably be clearer about what it does and your service responds faster. Any test results, I dont have, but you can easily notice the differences if you query large numbers of data. – Wietlol Apr 21 '18 at 22:14
  • @wietlol okay thanks. i started the bounty. Can the question be clearer? someone down voted it without comment as to why... wondering if can improve the question? – tgkprog Apr 22 '18 at 14:02

1 Answers1

1

Well that is a tricky question if you really would like to go into the depth of both parts.

Concurrency

As you wrote this "same name" thing could lead to a race condition if you are working on a multi thread system (almost all of the systems are like that nowadays). I have seen some coding done like this and it can cause a lot of trouble. The result file could have not only lines from both of the searches but merged characters as well.

Examples:

Thread 1 wants to write: 123456789\n
Thread 2 wants to write: abcdefghi\n

Outputs could vary in the mentioned ways:

1st case:

123456789
abcdefghi

2nd case:

1234abcd56789
efghi

I would definitely use at least unique (UUID.randomUUID()) names to "hot-fix" the problem.

Concurrency

Having disk IO is a tricky thing if you go in-depth. The speads could vary in a vide range. In the JVM you can have blocking and non-blocking IO as well. The blocking one could wait until the data is really on the disk and the other will do some "magic" to flush the file later. There is a good read in here.

TL.DR.: As a rule of thumb it is better to have things in the memory (if it could fit) and not bother with the disk. If you use thread memory for that purpose as well you can avoid the concurrency problem as well. So in your case it could be better to rewrite the given part to utilize the memory only and write to the output.

Hash
  • 4,647
  • 5
  • 21
  • 39
  • 1
    This is a good answer. I was really hoping for some definitive proof (data/ test cases). But you can have the bounty. Thanks – tgkprog Apr 26 '18 at 19:16