10

Problem statement

I think the title says it all: I'm looking for the way to parse a String containing the body part of a multipart/form-data HTTP request. I.e. the contents of the string would look something like this:

--xyzseparator-blah
Content-Disposition: form-data; name="param1"

hello, world
--xyzseparator-blah
Content-Disposition: form-data; name="param2"

42
--xyzseparator-blah
Content-Disposition: form-data; name="param3"

blah, blah, blah
--xyzseparator-blah--

What I'm hoping to obtain, is a parameters map, or something similar, like this.

parameters.get("param1");    // returns "hello, world"
parameters.get("param2");    // returns "42"
parameters.get("param3");    // returns "blah, blah, blah"
parameters.keys();           // returns ["param1", "param2", "param3"]

Further criteria

  • It would be best if I don't have to supply the separator (i.e. xyzseparator-blah in this case), but I can live with it if I do have to.
  • I'm looking for a library based solution, possibly from a main stream library (like "Apache Commons" or something similar).
  • I want to avoid rolling my own solution, but at the current stage, I'm afraid I will have to. Reason: while the example above seems trivial to split/parse with some string manipulation, real multipart request bodies can have many more headers. Besides that, I do not want to re-invent (and much less re-test!) the wheel :)

Alternative solution

If there were a solution, which satisfies the above criteria, but whose input is an Apache HttpRequest, instead of a String, that would be acceptable too. (Basically I do receive an HttpRequest, but the in-house library I'm using is built such, that it extracts the body of this request as a String, and passes that to the class responsible for doing the parsing. However, if need be, I could also work directly on the HttpRequest.)

Related questions

No matter how I try to find an answer through Google, here on SO, and on other forums too, the solution seems to be always to use commons fileupload to go through the parts. E.g.: here, here, here, here, here... However, parseRequest method, used in that solution, expects a RequestContext, which I do not have (only HttpRequest).

The other way, also mentioned in some of the above answers, is getting the parameters from the HttpServletRequest (but again, I only have HttpRequest).

EDIT: In other words: I could include Commons Fileupload (I have access to it), but that would not help me, because I have an HttpRequest, and the Commons Fileupload needs RequestContext. (Unless there is an easy way to convert from HttpRequest to RequestContext, which I have overlooked.)

Attilio
  • 1,624
  • 1
  • 17
  • 27
  • [Apache HttpClient Mime](https://hc.apache.org/httpcomponents-client-ga/httpmime/project-reports.html) – Andreas Jan 23 '18 at 21:16
  • Can you alter the content type header? If so - @BalusC might have you covered here [Convenient way to parse incoming multipart form data parameters in a servlet](https://stackoverflow.com/questions/3337056/convenient-way-to-parse-incoming-multipart-form-data-parameters-in-a-servlet) – JGlass Jan 23 '18 at 21:31
  • @Andreas: could you elaborate on that? I briefly checked the API, but I don't really see the class which would parse the request. Also, not clear how to get to this API from the `HttpRequest`... – Attilio Jan 23 '18 at 21:39
  • @JGlass: yeah, I had seen that answer (the first 'here' link is the answer below that!), but as I said, I do not have `HttpServletRequest`, so it does not help me. – Attilio Jan 23 '18 at 21:41
  • What framework **DO** you have? It feels a bit unreasonable to be parsing this sort of input without at least one of the major web frameworks already in use. – markspace Jan 23 '18 at 21:43
  • Ahh, sorry, I hovered over your links checking but didnt catch it, my apologies - my other idea, though strange is email libraries, they support parsing attachments and an email attachment I *believe* basically is the same format as HTTP attachments – JGlass Jan 23 '18 at 21:47
  • @markspace: I have [Apache HTTP](https://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/package-summary.html) and also commons, **inclulding** fileupload, if I want to. The problem is that I receive only a `HttpRequest`, whereas fileupload would need a `RequestContext`. I'll edit the answer to make this clear. – Attilio Jan 23 '18 at 22:04
  • @CloseVoter: could you please explain what is the problem? Is there any way I can improve the question? – Attilio Jan 23 '18 at 22:05
  • Sorry, that was for building a multi-part message. See [this question](https://stackoverflow.com/q/42533237/5221149). – Andreas Jan 23 '18 at 22:09
  • Where do you receive the `HttpRequest` from? Aren't you the one creating the request and receiving a response, given that Apache HttpComponents only has a Client implementation? – Andreas Jan 23 '18 at 22:13
  • I'm still searching for a context here, along with Andreas I think. A lot of Apache Commons is/was libraries that were broken out of Tomcat. So if you "have" those, you should be running Tomcat or some similar server. So it's weird that you say you "have" Apache HTTP but not that you're running Tomcat 9 or Wildfly 10.1. I'm not saying your wrong, just that's it's weird to try to understand what the real requirement might be here. – markspace Jan 23 '18 at 22:49

1 Answers1

8

You can parse your String using Commons FileUpload by wrapping it in a class implementing 'org.apache.commons.fileupload.UploadContext', like below.

I recommend wrapping the HttpRequest in your proposed alternate solution instead though, for a couple of reasons. First, using a String means that the whole multipart POST body, including the file contents,needs to fit into memory. Wrapping the HttpRequest would allow you to stream it, with only a small buffer in memory at one time. Second, without the HttpRequest, you'll need to sniff out the multipart boundary, which would normally be in the 'Content-type' header (see RFC1867).

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.commons.fileupload.FileItem;
import org.apache.commons.fileupload.FileItemFactory;
import org.apache.commons.fileupload.FileUpload;
import org.apache.commons.fileupload.disk.DiskFileItemFactory;

public class MultiPartStringParser implements org.apache.commons.fileupload.UploadContext {

    public static void main(String[] args) throws Exception {
        String s = new String(Files.readAllBytes(Paths.get(args[0])));
        MultiPartStringParser p = new MultiPartStringParser(s);
        for (String key : p.parameters.keySet()) {
            System.out.println(key + "=" + p.parameters.get(key));
        }
    }
    
    private String postBody;
    private String boundary;
    private Map<String, String> parameters = new HashMap<String, String>();
            
    public MultiPartStringParser(String postBody) throws Exception {
        this.postBody = postBody;
        // Sniff out the multpart boundary.
        this.boundary = postBody.substring(2, postBody.indexOf('\n')).trim();
        // Parse out the parameters.
        final FileItemFactory factory = new DiskFileItemFactory();
        FileUpload upload = new FileUpload(factory);
        List<FileItem> fileItems = upload.parseRequest(this);
        for (FileItem fileItem: fileItems) {
            if (fileItem.isFormField()){
                parameters.put(fileItem.getFieldName(), fileItem.getString());
            } // else it is an uploaded file
        }
    }
    
    public Map<String,String> getParameters() {
        return parameters;
    }

    // The methods below here are to implement the UploadContext interface.
    @Override
    public String getCharacterEncoding() {
        return "UTF-8"; // You should know the actual encoding.
    }
    
    // This is the deprecated method from RequestContext that unnecessarily
    // limits the length of the content to ~2GB by returning an int. 
    @Override
    public int getContentLength() {
        return -1; // Don't use this
    }

    @Override
    public String getContentType() {
        // Use the boundary that was sniffed out above.
        return "multipart/form-data, boundary=" + this.boundary;
    }

    @Override
    public InputStream getInputStream() throws IOException {
        return new ByteArrayInputStream(postBody.getBytes());
    }

    @Override
    public long contentLength() {
        return postBody.length();
    }
}
Community
  • 1
  • 1
roninjoe
  • 361
  • 2
  • 7
  • I get an error with the parseRequest method where java wants javax.servlet.http.HttpServletRequest imported, however I don't see that import in your example and I would like to not import it as I don't use it anywhere else and it would introduce another dependency. Also we're not using the method that is overloaded with HttpServletRequest but with RequestContext, so we're not even using it.. Do you have any suggestions? – gabriel Aug 09 '19 at 07:15
  • @gabriel This code has a transitive dependency on HttpServleRequest through the "org.apache.commons.fileupload.FileUpload" class. Normally, you would use a build tool like Maven, as described in [this post] (https://stackoverflow.com/questions/1370414/how-to-add-the-servlet-api-to-my-pom-xml), but you could also just download the jar(s), and put them in classpath manually if you'd prefer. You should not need to put an import statement in this class. – roninjoe Aug 15 '19 at 19:30