24

I want to split ByteString to words like so:

import qualified Data.ByteString as BS

main = do
    input <- BS.getLine
    let xs = BS.split ' ' input 

But it appears that GHC can't convert a character literal to Word8 by itself, so I got:

Couldn't match expected type `GHC.Word.Word8'
            with actual type `Char'
In the first argument of `BS.split', namely ' '
In the expression: BS.split ' ' input

Hoogle doesn't find anything with type signature of Char -> Word8 and Word.Word8 ' ' is invalid type constructor. Any ideas on how to fix it?

Andrew
  • 8,330
  • 11
  • 45
  • 78
  • 5
    Don't use `ByteString` for text! Use [`Text`](http://hackage.haskell.org/package/text) instead. – Daniel Wagner May 16 '12 at 18:59
  • @DanielWagner Why not? Is it faster than `ByteString`? – Andrew May 16 '12 at 20:00
  • 6
    `Text` is unicode-friendly, so your strings will be strings in all countries. `ByteString` is for binary parsing, raw memory access, and can't handle anything other than ascii or latin1. – Don Stewart May 16 '12 at 20:27
  • Interesting, thanks. That was for a programming-contest problem, so the range of possible encodings is limited to ascii. – Andrew May 16 '12 at 20:52
  • 1
    You probably want to use import qualified Data.ByteString.Char8 as B instead – George Co Sep 05 '21 at 14:44

5 Answers5

35

The Data.ByteString.Char8 module allows you to treat Word8 values in the bytestrings as Char. Just

import qualified Data.ByteString.Char8 as C

then refer to e.g. C.split. It's the same bytestring under the hood, but the Char-oriented functions are provided for convenient byte/ascii parsing.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
17

In case you really need Data.ByteString (not Data.ByteString.Char8), you could do what Data.ByteString itself does to convert between Word8 to Char:

import qualified Data.ByteString as BS
import qualified Data.ByteString.Internal as BS (c2w, w2c)

main = do
    input <- BS.getLine
    let xs = BS.split (BS.c2w ' ') input 
    return ()
Grwlf
  • 896
  • 9
  • 21
4

People looking for a simple Char -> Word8 with base library:

import Data.Word

charToWord8 :: Char -> Word8
charToWord8 = toEnum . fromEnum
2

I want to directly address the question in the subject line, which led me here in the first place.

You can convert a single Char to a single Word8 with fromIntegral.ord:

λ> import qualified Data.ByteString as BS
λ> import Data.Char(ord)

λ> BS.split (fromIntegral.ord $ 'd') $ BS.pack . map (fromIntegral.ord) $ "abcdef"

["abc","ef"]

Keep in mind that this conversion will be prone to overflows as demonstrated below.You have to assure that your Char fits in 8 bits, if you do not want this to occur.

λ> 260 :: Word8

4

Of course, for your particular problem, it is preferable to use the Data.ByteString.Char8 module as already pointed out in the accepted answer.

oo_miguel
  • 2,374
  • 18
  • 30
0

Another possible solution is the following:

charToWord8 :: Char -> Word8
charToWord8 = fromIntegral . ord
{-# INLINE charToWord8 #-}

where ord :: Chat → Int and the rest one can infer.

Jonathan Prieto-Cubides
  • 2,577
  • 2
  • 18
  • 17