2

I'm trying to test some code that uses scodec.bits.ByteVector.

In particular I'm using ByteVector.encodeUtf8(str: String): Either[CharacterCodingException, ByteVector]

Since this can return an potential error if encoding to UTF-8 fails, I have to handle the error condition. Of course, I can hide the call and mock my trait so that I forcibly return a Left[CharacterEncodingException] but that's too onerous.

What I would love to do is to create a String that has some invalid utf-8 bytes and call encodeUtf8 with that.

My guess is that this is not possible. No matter what I do, String class will coerce any bad entries into something that is nonsensical but is still valid UTF-8 (e.g. �). Is this right?

This is how I've been trying to create such a string:

new String(Array(255.toByte), "utf-8")

I also tried to create a string in some other encodings and then use that to encode to UTF-8 but ByteVector handles it.

Is this possible?

encee
  • 4,544
  • 4
  • 33
  • 35

1 Answers1

3

"\uDC00" is an invalid String that cannot be encoded in UTF-8. That's because it contains an unpaired surrogate code point.

Community
  • 1
  • 1
sjrd
  • 21,805
  • 2
  • 61
  • 91
  • Oh wow, excellent thank you! This does indeed work. ByteVector.encodeUtf8 returns a `Left(java.nio.charset.MalformedInputException: Input length = 1)` – encee Apr 29 '16 at 20:31