From the wireshark comments, it looks like python-requests is doing it wrong, but that there might not be a "right answer".
RFC 2388 says
Field names originally in non-ASCII character sets may be encoded within the value of the "name" parameter using the standard method described in RFC 2047.
RFC 2047, in turn, says
Generally, an "encoded-word" is a sequence of printable ASCII characters that begins with "=?", ends with "?=", and has two "?"s in between. It specifies a character set and an encoding method, and also includes the original text encoded as graphic ASCII characters, according to the rules for that encoding method.
and goes on to describe "Q" and "B" encoding methods. Using the "Q" (quoted-printable) method, the name would be:
=?utf-8?q?=E2=98=83?=
BUT, as RFC 6266 clearly states:
An 'encoded-word' MUST NOT be used in parameter of a MIME Content-Type or Content-Disposition field, or in any structured field body except within a 'comment' or 'phrase'.
so we're not allowed to do that. (Kudos to @Lukasa for this catch!)
RFC 2388 also says
The original local file name may be supplied as well, either as a
"filename" parameter either of the "content-disposition: form-data"
header or, in the case of multiple files, in a "content-disposition:
file" header of the subpart. The sending application MAY supply a
file name; if the file name of the sender's operating system is not
in US-ASCII, the file name might be approximated, or encoded using
the method of RFC 2231.
And RFC 2231 describes a method that looks more like what you're seeing. In it,
Asterisks ("*") are reused to provide the indicator that language and
character set information is present and encoding is being used. A
single quote ("'") is used to delimit the character set and language
information at the beginning of the parameter value. Percent signs
("%") are used as the encoding flag, which agrees with RFC 2047.
Specifically, an asterisk at the end of a parameter name acts as an
indicator that character set and language information may appear at
the beginning of the parameter value. A single quote is used to
separate the character set, language, and actual value information in
the parameter value string, and an percent sign is used to flag
octets encoded in hexadecimal.
That is, if this method is employed (and supported on both ends), the name should be:
name*=utf-8''%E2%98%83
Fortunately, RFC 5987 adds an encoding based on RFC 2231 to HTTP headers! (Kudos to @bobince for this find) It says you can (any probably should) include both a RFC 2231-style value and a plain value:
Header field specifications need to define whether multiple instances
of parameters with identical parmname components are allowed, and how
they should be processed. This specification suggests that a
parameter using the extended syntax takes precedence. This would
allow producers to use both formats without breaking recipients that
do not understand the extended syntax yet.
Example:
foo: bar; title="EURO exchange rates";
title*=utf-8''%e2%82%ac%20exchange%20rates
In their example, however, they "dumb down" the plain value for "legacy clients". This isn't really an option for a form-field name, so it seems like the best approach might be to include both name=
and name*=
versions, where the plain value is (as @bobince describes it) "just sending the bytes, quoted, in the same encoding as the form", like:
Content-Disposition: form-data; name="☃"; name*=utf-8''%E2%98%83
See also:
Finally, see http://larry.masinter.net/1307multipart-form-data.pdf (also https://www.w3.org/Bugs/Public/show_bug.cgi?id=16909#c8 ), wherein it is recommended to avoid the problem by sticking with ASCII form field names.