Before finishing the question, I looked into this some more and found the answer: RFC 5987 encoding (assuming the HTTP server at the other end handles that correctly).
I was able to do this in my BASH script, thanks to an answer to a related question (how to URL-encode within BASH). See the answer Here is the pure BASH answer to the question URLEncode from a bash script
My code to do the encoding:
#-----------------------------------------------
# RFC 5987 encode a string
#
# Input string is in parameter $1.
# Result is stored in global variable URL_ENCODED_STR
#-----------------------------------------------
function rfc5987_encode ()
{
local string="${1}"
local strlen=${#string}
local encoded=
local pos
local c
local o
local TICK=\'
#
# Set up encoded string preamble, which is:
# charset "'" [ language ] "'" value-chars
#
encoded="UTF-8${TICK}${TICK}"
#
# Loop through string, examining each character.
# Safe characters are copied to new string as-is.
# Unsafe characters are copied as '%' and the hex code
# of the character (using the bash built-in 'printf').
#
# Safe characters are:
# ALPHA / DIGIT
# "!" / "#" / "$" / "&" / "+" / "-" / "."
# "^" / "_" / "\`" / "|" / "~"
#
for (( pos=0 ; pos < strlen ; pos++ )); do
c=${string:$pos:1} # 'c' is current character
case "$c" in
[\!\#$\&+-.\^_\`\|~a-zA-Z0-9] ) # safe characters copied as-is
o="${c}"
;;
* ) # everything else is encoded
printf -v o '%%%02x' "'$c"
esac
encoded+="${o}" # 'o' is output character
done
URL_ENCODED_STR="${encoded}"
}
In accordance with RFC 5987, one also has to add an asterisk to the end of the header field name.
Using this, the multi-line string:
This is line one.
This is line two.
This is line four (line three was blank).
Lots of "special" and 'funny' characters might lurk here, you know?
When sent in a header field named X-Foo-Caption
, ends up as:
curl -H X-Foo-Caption*:UTF-8''This%20is%20line%20one.%0dThis%20is%20line%20two.%0d%0dThis%20is%20line%20four%20%28line%20three%20was%20blank%29.%0dLots%20of%20%22special%22%20and%20%27funny%27%20characters%20might%20lurk%20here,%20you%20know%3f -H 'X-Smug-Keywords: blank;lines;funny;characters;weird special stuff;who knows?;
To my utter amazement, the server handles this just fine.
Note that this is not URL encoding. URL encoding is used for URLs, while RFC 5987 encoding is used for HTTP headers. The end results are often different, because the two have different sets of safe characters and slightly different outputs. Examples:
Original URL-encoded RFC 5987 Encoding:
======== =========== ==================
"a space" a%20space UTF-8''a%20space
"foo" foo UTF-8''foo
"100%" 100%25 UTF-8''100%25
"$10.30" %2410.30 UTF-8''$10.30
"#1 fun" %231%20fun UTF-8''#1%20fun
Note also that the HTTP header needs to have an asterisk appended, to indicate that the value has been RFC 5987 encoded, so X-Foo: #1 fun
gets sent in HTTP as X-Foo*: UTF-8''#1%20fun