Is there a way to get the first UTF-8 Char
in a ByteString
in O(1) time? I'm looking for something like
headUtf8 :: ByteString -> Char
tailUtf8 :: ByteString -> ByteString
I'm not yet constrained to use strict or lazy ByteString
, but I'd prefer strict. For lazy ByteString
, I can cobble something together via Text
, but I'm not sure how efficient (especially space-complexity wise) this is.
import qualified Data.Text.Lazy as T
import Data.Text.Lazy.Encoding (decodeUtf8With, encodeUtf8)
import Data.Text.Encoding.Error (lenientDecode)
headUtf8 :: ByteString -> Char
headUtf8 = T.head . decodeUtf8With lenientDecode
tailUtf8 :: ByteString -> ByteString
tailUtf8 = encodeUtf8 . T.tail . decodeUtf8With lenientDecode
In case anyone is interested, this problem arises when using Alex to make a lexer that supports UTF-8 characters1.
1 I am aware that since Alex 3.0 you only need to provide alexGetByte
(and that is great!) but I still need to be able to get characters in other code in the lexer.