2

Note: this question is closely related to Is it acceptable for a server to send a HTTP response before the entire request has been received? with the difference that (1) I'm not sending an error, I'm sending a 200 OK, and (2) I control both the client and server, so don't really care about browser support.

Context: I am implementing a Java HTTP client and server for managing files. In particular an "upload" query contains a file path and the file body, and the server responds with a numerical identifier for the file. However if a file with the same path has already been uploaded, the server will simply respond with the previously generated identifier.

Concretely: if I write the server as follows (sparkjava)

put(url, (req, res) -> {
  Item existing = lookForExistingItem(req);
  if (existing != null) {
     return existing.getId();
  }
  /* Otherwise, consume input, save, generate id and return that */
});

... then the server will respond with the id and close the connection before the client finished sending data. If I write the client as follows:

final HttpURLConnection connection = (HttpURLConnection) new URL(...).openConnection();
connection.setDoOutput(true);
connection.setRequestMethod("PUT");
ByteStreams.copy(fileInput, connection.getOutputStream());
final String response = CharStreams.toString(new InputStreamReader(connection.getInputStream()));

then an IOException is thrown during the copy operation due to the closed connection. After that point I am not able to access the connection's InputStream anymore.

My Question: how can I make this work? If I change the server to consume the whole input and throw it away, it works, but it feels like wasting resources (some of the files being uploaded may be videos weighing hundreds of megabytes). Is there any way to change the client code to deal with that scenario?

tendays
  • 241
  • 2
  • 8
  • question: what if the user wants to upload a 'newer' version of the video? the file will contain the same path (including file) name but user won't be able to since it will receive the identifier of the previously loaded file. – blurfus May 31 '20 at 15:58
  • generally speaking, you cannot process a request if you have not received *all* of the request. What you can do is a 2-step approach: 1) check if the file has been loaded already. If so, return the identifier - if not, 2) upload the file - two separate requests – blurfus May 31 '20 at 16:00
  • @blurfus: thanks for the comments. 1. Uploaded files are explicitly immutable in my application. 2. Several people answered related questions saying it should be allowed (at least when returning errors, not sure about 2xx case though). 3. The two-step approach is a good one for my case, indeed, however I would still like to know which of the server or client in my current case is wrong. – tendays May 31 '20 at 16:16
  • I'd probably use the same approach as browsers do for XSS check: they do a pre-flight check (a HEAD request to the intended URL) - if the headers do not contain the necessary info/header, they do not proceed to the actual HTTP request. This would be similar: send a simplified version of the POST request (i.e. a GET request with the filepath) and see if the response is 404 (i.e not found) if so, proceed with POST, else retrieve the ID from the response (or other suitable handling) – blurfus May 31 '20 at 17:27
  • @tendays using above method, clients who try to upload same files concurrently would see the same result for their initial GET request i.e file not found coz its not on the server and both of them will start uploading the file. – printfmyname May 31 '20 at 17:39

1 Answers1

2

You could break that call in to several requests assuming that files are big enough and making multiple requests consumes far less resources than transferring a partial file.

enum UploadStatus {
   INITIALIZED,
   STARTED,
   UPLOADED,
   ERROR
}

My Suggestion:

  1. Have a static map ConcurrentMap<File name string, UploadStatus> (or DB entry) where you can keep track of file upload statuses
  2. Create an endpoint to check and set file status
  3. Client first make a request to above endpoint
    • if file exist on the map and it's status is not UploadStatus.ERROR, set the file's status on the map to UploadStatus.INITIALIZED and let client (client A) know it can upload the file (Should do this on a synchronized block)
  4. If file exists and UploadStatus.INITIALIZED, let that client (client B) know its being uploaded. For the sake of UX, you could make the client B poll for the file status until UploadStatus becomes ERROR or UPLOADED and then take appropriate action. i.e.
    • Re-upload file on UploadStatus.ERROR
    • Show uploaded message on UploadStatus.UPLOADED
  5. Once the server receive the request to upload the actual file from the client A, keep the file upload status up to date so that on error other clients such as Client B can re-upload a failed file.

Doing the file status check and set on a single sync block is important to avoid race condition when setting correct file status. Also, that enum is just to explain the general high level steps. Since you already have Guava, you could use Guava Cache with time base eviction for storing the file statues.

printfmyname
  • 983
  • 15
  • 30
  • Thanks for the detailed suggestion. Are you saying that, in its current state, the server is in violation of the HTTP protocol? Should I file a bug to the Sparkjava framework? – tendays Jun 06 '20 at 10:32
  • I dont think there is any problem with any components you use. Just the nature of the task requires you to have some atomicity. i.e. from the time first client start to check if a file exist and to end of file upload, other clients shouldn't be able to upload files. These types of things are not part of HTTP protocol or Spark framework, user need to implement them. – printfmyname Jun 06 '20 at 15:15
  • Ok because the file upload thing was for context but that isn't my question. What I'm really interested in is knowing if an early HTTP 200 answer is allowable, and if so, how a Java client can deal with that. I do however appreciate the time you spent writing your answer and upvoted it. – tendays Jun 07 '20 at 08:37
  • If you return a early (I assume you mean when file is already uploaded) HTTP 200, java client can just let user know that file already exists. or you could send a 409 saying that someone already uploaded the same file. But when you receive that response, it is totally up to you to decide how you want to display the message to the client. You could show an alert or a message box with some description. FYI, you could use connection.getResponseCode() to get the status code and react as you wish. – printfmyname Jun 07 '20 at 19:16
  • Maybe I'm not able to articulate my question properly. See the title: (1) does HTTP allow a server to return a 2xx and a body before the client had time to send the entire body, whatever the reason for doing that might be? (2) How to let a Java client detect that situation, access the status code and response body? What it should do once it has that information is out of scope. – tendays Jun 13 '20 at 08:58
  • I tried accessing the input stream with `connection.getInputStream()` but that seems to kill the output stream if no input is available. And if the client keeps trying to transmit through the `OutputStream` after the server returned and closed the connection, the client gets an `IOException`, and trying to do `connection.getInputStream()` after catching that exception fails. Trying to get the `InputStream` early and using it from the catch clause actually closes the output immediately, so the server sees an empty body, even if the client doesn't try to read response data. – tendays Jun 13 '20 at 09:02
  • Empirically, it appears that at least the answer to the second question is that it is not possible to access a server doing that. I don't know if the server complies to the HTTP protocol or not, but I guess it's moot as long as I can't use it. – tendays Jun 13 '20 at 09:04