I have hooked up my Play2+Scala application to Sendgrid Parse Api and I'm really struggling in decoding and encoding the content of the email.
Since the emails could be in different encodings Sendgrid provides us with a JSON object charsets:
{"to":"UTF-8","cc":"UTF-8","subject":"UTF-8","from":"UTF-8","text":"iso-8859-1","html":"iso-8859-1"}
In my test case "text"
is "Med Vänliga Hälsningar Jakobs Webshop"
If I extract that from the multipart request and print it out:
Logger.info(request.body.dataParts.get("text").get)
I get:
Med V?nliga H?lsningar Jakobs Webshop
Ok so with the given info from Sendgrid let's fix the string so that it is UTF-8.
def parseMail = Action(parse.multipartFormData) {
request => {
val inputBuffer = request.body.dataParts.get("text").map {
v => ByteBuffer.wrap(v.head.getBytes())
}
val fromCharset = Charset.forName("ISO-8859-1")
val toCharset = Charset.forName("UTF-8")
val data = fromCharset.decode(inputBuffer.get)
Logger.info(""+data)
val outputBuffer = toCharset.encode(data)
val text = new String(outputBuffer.array())
// Save stuff to MongoDB instance
}
This results in:
Med V�nliga H�lsningar Jakobs Webshop
So this is very strange. This should work.
I wonder what actually happens in the body parser parse.multipartFormData
and the datapart handler:
def handleDataPart: PartHandler[Part] = {
case headers @ PartInfoMatcher(partName) if !FileInfoMatcher.unapply(headers).isDefined =>
Traversable.takeUpTo[Array[Byte]](DEFAULT_MAX_TEXT_LENGTH)
.transform(Iteratee.consume[Array[Byte]]().map(bytes => DataPart(partName, new String(bytes, "utf-8")))(play.core.Execution.internalContext))
.flatMap { data =>
Cont({
case Input.El(_) => Done(MaxDataPartSizeExceeded(partName), Input.Empty)
case in => Done(data, in)
})
}(play.core.Execution.internalContext)
}
When consuming the data a new String is created with the encoding utf-8:
.transform(Iteratee.consume[Array[Byte]]().map(bytes => DataPart(partName, new String(bytes, "utf-8")))(play.core.Execution.internalContext))
Does this mean that my ISO-8859-1 encoded string text is encoded with utf-8 when parsed? If so, how should I create my parser to decode and then encode my params according to the provided JSON object charsets? Clearly I'm doing something wrong but I can't figure it out!