41

I'm writing a script that uploads a file to a cgi script that expects a multipart request, such as a form on a HTML page. The boundary is a unique token that annotates the file contents in the request body. Here's an example body:

--BOUNDARY
Content-Disposition: form-data; name="paramname"; filename="foo.txt"
Content-Type: text/plain

... file contents here ...
--BOUNDARY--

The boundary cannot be present in the file contents, for obvious reasons.

What should I do in order to create an unique boundary? Should I generate a random string, check to see if it is in the file contents, and if it is, generate a new, rinse and repeat, until I have a unique string? Or would a "pretty random token" (say, combination of timestamp, process id, etc) be enough?

August Lilleaas
  • 54,010
  • 13
  • 102
  • 111
  • 2
    What programming language do you use? Usually such a thing is handled by a library. –  Jan 15 '10 at 12:03
  • I'm using Ruby. It would have to be in the stdlib, though, can't use gems since the script should be runnable on any system with ruby installed, without having to install gems. – August Lilleaas Jan 15 '10 at 12:32
  • 1
    BOUNDARY may be fine, but be sure to use \r\n (DOS line encoding) because with just \n it gracefully fails with "Header section has more than 10240 bytes" error. – andrej Jun 18 '20 at 14:18

4 Answers4

56

If you use something random enough like a GUID there shouldn't be any need to hunt through the payload to check for an alias of the boundary. Something like:-

----=NextPart_3676416B-9AD6-440C-B3C8-FC66DDC7DB45
Header:....

Payload
----=NextPart_3676416B-9AD6-440C-B3C8-FC66DDC7DB45--

AnthonyWJones
  • 187,081
  • 35
  • 232
  • 306
14

For Java guys :

protected String generateBoundary() {
             StringBuilder buffer = new StringBuilder();
             Random rand = new Random();
             int count = rand.nextInt(11) + 30; // a random size from 30 to 40
             for (int i = 0; i < count; i++) {
             buffer.append(MULTIPART_CHARS[rand.nextInt(MULTIPART_CHARS.length)]);
             }
             return buffer.toString();
        }

private final static char[] MULTIPART_CHARS =
             "-_1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
                  .toCharArray();

Reference url : http://hc.apache.org/httpcomponents-client-ga/httpmime/xref/org/apache/http/entity/mime/MultipartEntity.html

Kirk Woll
  • 76,112
  • 22
  • 180
  • 195
John
  • 141
  • 1
  • 2
1

If you are feeling paranoid, you can generate a random boundary and search for it in the string to be sent, append random char (or re-create new) on find, repeat. But my experience is any arbitrary non-dictionary string of 10 or so characters is about impossible to occur, so picking something like ---BOUNDARY---BOUNDARY---BOUNDARY--- is perfectly sufficient.

SF.
  • 13,549
  • 14
  • 71
  • 107
  • 46
    No, it is not sufficient. Because you won't be able to send your program source code (or this comment) using your program. – stepancheg Jul 03 '10 at 17:36
  • 5
    @stepancheg: It seems you are feeling paranoid, in this case use the solution from the first paragraph of my answer. If you are mentally healthy though, use `Content-Encoding: gzip` and stop worrying about users out there trying to get you. – SF. Jul 05 '10 at 08:42
  • 1
    It is the responsibility of the programmer to avoid foreseeable future errors. – fikr4n Mar 02 '16 at 22:42
  • @BornToCode: If the user purposefuly tries to make the application fail, you can't stop them - you may only limit the impact to that single user. The chance that a random *compressed* content accidentally encodes during compression to one specific string of 39 characters is around 1:2^47 which means it's well within limits of acceptability (UUID is not better and it is deemed sufficient.) - one would need to purposefully construct a content that compresses to the boundary code, and then we can just reject it; it's not a valid content but a malicious attack. – SF. Mar 03 '16 at 14:57
  • I think that many users will copy the boundary from this answer. As well as many others boundaries found on the stackoverflow and other tutorials. Anyway this kind of vulnerability is not so dangerous :D – kelin Jan 19 '18 at 06:48
  • @kelin: and as long as they don't send this answer as uncompressed, unencoded plain text, it will work perfectly well. If they are encoding the bound context some way (base64, gzip, whatever, even rot13 will do), this becomes complete non-issue. – SF. Jan 19 '18 at 07:05
  • @SF., so you imply that we should compress body parts (without Content-Disposition etc), not the entire multipart body? (I'm asking because I don't know, not because I want to argue) – kelin Jan 19 '18 at 07:17
  • @kelin: Yes - that's the preferred approach, especially that different parts may be optimally compressed using different methods. You'll usually have a heading part that is a manifest, checksums, list of encodings etc, that is best to be left readable (plaintext); you'll have textual content (XML, SVG, plain text) that is good for compression, you'll likely have parts that are already compressed (like JPEG) which would not benefit from extra compression, you may have sensitive parts you'd want encrypted etc. – SF. Jan 19 '18 at 07:27
  • In case you want to encode the entirety of the content, you should know what sort of content is to be expected, and so, not send its own source code over it. – SF. Jan 19 '18 at 07:28
1

And for the Swift people (to balance the Java):

func createBoundaryString() -> String {
    var str = ""
    let length = arc4random_uniform(11) + 30
    let charSet = [Character]("-_1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

    for _ in 0..<length {
        str.append(charSet[Int(arc4random_uniform(UInt32(charSet.count)))])
    }
    return str
}
sketchyTech
  • 5,746
  • 1
  • 33
  • 56