Why do we need boundaries in multipart data format?

Question

Title says it all. I mean suppose we are trying to upload multiple images, for each multipart section we will have sub-headers like

Content-Disposition: form-data; name="file"; filename="mia.jpeg"
Content-Type: image/jpeg
Content-Length: 5379

Content-Length is enough to tell the parser when this content part is over and starts another part. But I'm missing something most likely, can you help ?

before parser can parse content-length it needs to split raw data into chunks — Iłya Bursov, Aug 08 '18 at 15:06

score 1 · Answer 1 · edited Oct 07 '21 at 11:02

Why do we need boundaries in multipart data format?

Boundaries are delimiters meant to allow the server to split the message into chunks or body parts. A multipart request can contain any arbitrary number of body parts. The multipart/form-data requests are currently defined in the RFC 7578.

Each part consists of its own content header (zero or more Content- header fields) and a body. It's also important to mention that the boundary delimiter must not appear inside any of the encapsulated parts.

Another relevant document is the RFC 2046, which defines multipart MIME data streams:

The body must then contain one or more body parts, each preceded by a boundary delimiter line, and the last one followed by a closing boundary delimiter line. After its boundary delimiter line, each body part then consists of a header area, a blank line, and a body area.

score 0 · Accepted Answer · edited Oct 07 '21 at 12:12

0

Content-Length isn't a requirement of multipart content. This issue of using lengths is addressed in part the old RFC:

5.2 Other data encodings rather than multipart

Various people have suggested using new mime top-level type "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of "packet" to express indeterminate-length binary data, rather than relying on the multipart-style boundaries. While this would be useful, the "multipart" mechanisms are well established, simple to implement on both the sending client and receiving server, and as efficient as other methods of dealing with multiple combinations of binary data.

That text isn't in the current one, though; length doesn't appear in it at all.

This makes particular sense if you consider a sender sending the result of a stream as one part of a multipart post, when it may not know in advance the lenth of that stream's data. If length were required, it would need to either cache or read twice.

edited Oct 07 '21 at 12:12

Community

1
1

answered Aug 08 '18 at 15:09

T.J. Crowder

1,031,962
187
1,923
1,875

yes it makes sense, so even omitting length servers should be able to decode the multipart data, because their parsing algorith is based on boundaries rather than counting bytes ? – GionJh Aug 08 '18 at 15:12
@GionJh - Right. – T.J. Crowder Aug 08 '18 at 15:13
one last thing, I 've read somewhere (don't remember the source sorry) that sending binary data in multi-part-format is not a good thing and you should encode it in Base64, what do you think ? – GionJh Aug 08 '18 at 15:14
@GionJh - I have no special knowledge about that, sorry. – T.J. Crowder Aug 08 '18 at 15:17
@T.J.Crowder You've picked a long outdated RFC. Check the RFC 7578 for a current documentation. – cassiomolin Aug 08 '18 at 15:24
1

@CassioMazzochiMolin - Thanks, first one I found, as surprisingly I don't have these memorized. :-) – T.J. Crowder Aug 08 '18 at 15:24
@GionJh concerning base64 see https://stackoverflow.com/questions/3538021/why-do-we-use-base64 as well as https://stackoverflow.com/questions/4070693/what-is-the-purpose-of-base-64-encoding-and-why-it-used-in-http-basic-authentica and base64 or multi-part is not an either or as they can be used together. They solve different problems. Something sent as base64 is just another document. – Richard Chambers Aug 08 '18 at 15:35

Why do we need boundaries in multipart data format?

2 Answers2