28

I'm a Haskell newbie, and having a bit of trouble figuring out how to pattern match a ByteString. The [Char] version of my function looks like:

dropAB :: String -> String
dropAB []       = []
dropAB (x:[])   = x:[]
dropAB (x:y:xs) = if x=='a' && y=='b'
                  then dropAB xs
                  else x:(dropAB $ y:xs) 

As expected, this filters out all occurrences of "ab" from a string. However, I have problems trying to apply this to a ByteString.

The naive version

dropR :: BS.ByteString -> BS.ByteString
dropR []         = []
dropR (x:[])     = [x]
<...>

yields

Couldn't match expected type `BS.ByteString'
       against inferred type `[a]'
In the pattern: []
In the definition of `dropR': dropR [] = []

[] is clearly the culprit, as it is for a regular String not a ByteString. Subbing in BS.empty seems like the right thing but gives "Qualified name in the binding position: BS.empty." Leaving us to try

dropR :: BS.ByteString -> BS.ByteString
dropR empty              = empty        
dropR (x cons empty)     = x cons empty
<...>

this gives "parse error in pattern" for (x cons empty). I don't really know what else I can do here.

As a side note, what I'm trying to do with this function is to filter out a specific UTF16 character from some text. If there's a clean way to accomplish that, I'd love to hear it, but this pattern matching error seems like something that a newbie haskeller should really understand.

Cactus
  • 27,075
  • 9
  • 69
  • 149
LOS
  • 523
  • 4
  • 6
  • I'm not sure, but maybe guards rather than pattern matching? – li.davidm Oct 29 '10 at 23:24
  • 1
    You can't filter out a UTF-16 character. Maybe you meant "filter out a character of a text which is encoded in UTF-16". – gawi Oct 30 '10 at 00:55
  • For posterity, since the OP asked: consider if you _really_ need to be processing your `ByteString` as if it is a linked list. In this case, [`decodeUtf16LE`](http://hackage.haskell.org/package/text/docs/Data-Text-Encoding.html#v:decodeUtf16LE) and [`Text.filter`](http://hackage.haskell.org/package/text/docs/Data-Text.html#v:filter) `(/= '消')` are cleaner high-level tools to achieve what LOS is trying to do. Whatever case brought you to this question might have a similar solution! – Lynn Mar 25 '19 at 22:29

5 Answers5

27

You can use view patterns for such things

{-# LANGUAGE ViewPatterns #-}    
import Data.ByteString (ByteString, cons, uncons, singleton, empty)
import Data.ByteString.Internal (c2w) 

dropR :: ByteString -> ByteString
dropR (uncons -> Nothing) = empty
dropR (uncons -> Just (x,uncons -> Nothing)) = singleton x
dropR (uncons -> Just (x,uncons -> Just(y,xs))) =
    if x == c2w 'a' && y == c2w 'b'
    then dropR xs
    else cons x (dropR $ cons y xs)
Ed'ka
  • 6,595
  • 29
  • 30
  • 3
    bytestrings make Haskell code look so ugly; all the elegance of prelude String seems to go away :( – mntk123 Sep 18 '15 at 06:55
  • 1
    @mntk123 Haskell strings are character link lists and quite inefficient. They still exist for backward compatibility. Bytestring as well as Text package offer much more powerful solutions to the same problem. – Jaseem Nov 01 '16 at 09:25
14

The latest version of GHC (7.8) has a feature called pattern synonyms which can be added to gawi's example:

{-# LANGUAGE ViewPatterns, PatternSynonyms #-}

import Data.ByteString (ByteString, cons, uncons, singleton, empty)
import Data.ByteString.Internal (c2w)

infixr 5 :<

pattern b :< bs <- (uncons -> Just (b, bs))
pattern Empty   <- (uncons -> Nothing)

dropR :: ByteString -> ByteString
dropR Empty          = empty
dropR (x :< Empty)   = singleton x
dropR (x :< y :< xs)
  | x == c2w 'a' && y == c2w 'b' = dropR xs
  | otherwise                    = cons x (dropR (cons y xs))

Going further you can abstract this to work on any type class (this will look nicer when/if we get associated pattern synonyms). The pattern definitions stay the same:

{-# LANGUAGE ViewPatterns, PatternSynonyms, TypeFamilies #-}

import qualified Data.ByteString as BS
import Data.ByteString (ByteString, singleton)
import Data.ByteString.Internal (c2w)
import Data.Word

class ListLike l where
  type Elem l

  empty  :: l
  uncons :: l -> Maybe (Elem l, l)
  cons   :: Elem l -> l -> l

instance ListLike ByteString where
  type Elem ByteString = Word8

  empty  = BS.empty
  uncons = BS.uncons
  cons   = BS.cons

instance ListLike [a] where
  type Elem [a] = a

  empty         = []
  uncons []     = Nothing
  uncons (x:xs) = Just (x, xs)
  cons          = (:)

in which case dropR can work on both [Word8] and ByteString:

-- dropR :: [Word8]    -> [Word8]
-- dropR :: ByteString -> ByteString
dropR :: (ListLike l, Elem l ~ Word8) => l -> l
dropR Empty          = empty
dropR (x :< Empty)   = cons x empty
dropR (x :< y :< xs)
  | x == c2w 'a' && y == c2w 'b' = dropR xs
  | otherwise                    = cons x (dropR (cons y xs))

And for the hell of it:

import Data.ByteString.Internal (w2c)

infixr 5 :•    
pattern b :• bs <- (w2c -> b) :< bs

dropR :: (ListLike l, Elem l ~ Word8) => l -> l
dropR Empty              = empty
dropR (x   :< Empty)     = cons x empty
dropR ('a' :• 'b' :• xs) = dropR xs
dropR (x   :< y   :< xs) = cons x (dropR (cons y xs))

You can see more on my post on pattern synonyms.

Iceland_jack
  • 6,848
  • 7
  • 37
  • 46
10

Patterns use data constructors. http://book.realworldhaskell.org/read/defining-types-streamlining-functions.html

Your empty is just a binding for the first parameter, it could have been x and it would not change anything.

You can't reference a normal function in your pattern so (x cons empty) is not legal. Note: I guess (cons x empty) is really what you meant but this is also illegal.

ByteString is quite different from String. String is an alias of [Char], so it's a real list and the : operator can be used in patterns.

ByteString is Data.ByteString.Internal.PS !(GHC.ForeignPtr.ForeignPtr GHC.Word.Word8) !Int !Int (i.e. a pointer to a native char* + offset + length). Since the data constructor of ByteString is hidden, you must use functions to access the data, not patterns.


Here a solution (surely not the best one) to your UTF-16 filter problem using the text package:

module Test where

import Data.ByteString as BS
import Data.Text as T
import Data.Text.IO as TIO
import Data.Text.Encoding

removeAll :: Char -> Text -> Text
removeAll c t =  T.filter (/= c) t

main = do
  bytes <- BS.readFile "test.txt"
  TIO.putStr $ removeAll 'c' (decodeUtf16LE bytes)
dave4420
  • 46,404
  • 6
  • 118
  • 152
gawi
  • 13,940
  • 7
  • 42
  • 78
  • Didn't know that bit about patterns and data constructors. Since, as noted below, ByteString doesn't export its constructors, this makes sense now. Thanks to all who answered. – LOS Oct 30 '10 at 20:58
6

For this, I would pattern match on the result of uncons :: ByteString -> Maybe (Word8, ByteString).

Pattern matching in Haskell only works on constructors declared with 'data' or 'newtype.' The ByteString type doesn't export its constructors you cannot pattern match.

jrockway
  • 42,082
  • 9
  • 61
  • 86
Antoine Latter
  • 1,545
  • 10
  • 13
2

Just to address the error message you received and what it means:

Couldn't match expected type `BS.ByteString'
       against inferred type `[a]'
In the pattern: []
In the definition of `dropR': dropR [] = []

So the compiler expected your function to be of type: BS.ByteString -> BS.ByteString because you gave it that type in your signature. Yet it inferred (by looking at the body of your function) that the function is actually of type [a] -> [a]. There is a mismatch there so the compiler complains.

The trouble is you are thinking of (:) and [] as syntactic sugar, when they are actually just the constructors for the list type (which is VERY different from ByteString).

Cubic
  • 14,902
  • 5
  • 47
  • 92
jberryman
  • 16,334
  • 5
  • 42
  • 83