4

I'm working on a network streaming client that needs to talk to the server. The server encodes the responses in bytestrings, for example, "1\NULJohn\NULTeddy\NUL501\NUL", where '\NUL' is the separator. The above response translates to "This is a message of type 1(hard coded by the server), which tells the client what the ID of a user is(here, the user id of "John Teddy" is "501").

So naively I define a custom data type

data User
  { firstName :: String
  , lastName :: String
  , id :: Int
  }

and a parser for this data type

parseID :: Parser User
parseID = ...

Then one just writes a handler to do some job(e.g., write to a database) after the parser succesfully mathes a response like this. This is very straightforward.

However, the server has almost 100 types of different responses like this that the client needs to parse. I suspect that there must be a much more elegant way to do the job rather than writing 100 almost identical parsers like this, because, after all, all haksell coders are lazy. I am a total newbie to generic programming so can some one tell me if there is a package that can do this job?

user2812201
  • 437
  • 3
  • 7
  • Generics can do this. You could build a generic parser on top of something like attoparsec with a `Parseable` typeclass that provides a default implementation for anything implementing `Generic`. Then you just need `instance Parseable User where` to be able to parse it. – bheklilr Mar 03 '17 at 16:31
  • Good to know. Where can I find more detail? I did google "attoparsec generic parsable" however, the search result was not very helpful. – user2812201 Mar 03 '17 at 16:41
  • 1
    `Parsable` would be the type class you write yourself. Attoparsec is a library that is decent at parsing bytestrings. Generic is a built-in typeclass that provides functions for getting a generic representation of a data type that can be manipulated in code. For example, aeson provides a `FromJSON` typeclass that can take advantage of `Generic` so that you can do `instance FromJSON MyType where` without any extra work to get the ability to parse JSON to values of `MyType`. – bheklilr Mar 03 '17 at 16:57
  • @user2812201 The `GHC.Generics` docs has [an example of generic programming for an `Encode` type class](https://hackage.haskell.org/package/base-4.9.1.0/docs/GHC-Generics.html#g:10). Similar sort of idea, except encoding instead of decoding. – Alec Mar 03 '17 at 17:04
  • Thank you bheklilr. So I do sth in line of `class Parsable where parse :: ByteString -> Parser Parsable` and `data User ... deriving Generic` and `instance Parsable User where parse = ...` ? I will check the src code of aeson and figure it out. – user2812201 Mar 03 '17 at 17:05
  • Related: http://stackoverflow.com/questions/38248692/whats-a-better-way-of-managing-large-haskell-records – danidiaz Mar 03 '17 at 17:23

2 Answers2

5

For these kinds of problems I turn to generics-sop instead of using generics directly. generics-sop is built on top of Generics and provides functions for manipulating all the fields in a record in a uniform way.

In this answer I use the ReadP parser which comes with base, but any other Applicative parser would do. Some preliminary imports:

{-# language DeriveGeneric #-}
{-# language FlexibleContexts #-}
{-# language FlexibleInstances #-}
{-# language TypeFamilies #-}
{-# language DataKinds #-}
{-# language TypeApplications #-} -- for the Proxy

import Text.ParserCombinators.ReadP (ReadP,readP_to_S)
import Text.ParserCombinators.ReadPrec (readPrec_to_P)
import Text.Read (readPrec)
import Data.Proxy
import qualified GHC.Generics as GHC
import Generics.SOP

We define a typeclass that can produce an Applicative parser for each of its instances. Here we define only the instances for Int and Bool:

class HasSimpleParser c where
    getSimpleParser :: ReadP c

instance HasSimpleParser Int where
    getSimpleParser = readPrec_to_P readPrec 0

instance HasSimpleParser Bool where
    getSimpleParser = readPrec_to_P readPrec 0

Now we define a generic parser for records in which every field has a HasSimpleParser instance:

recParser :: (Generic r, Code r ~ '[xs], All HasSimpleParser xs) => ReadP r
recParser = to . SOP . Z <$> hsequence (hcpure (Proxy @HasSimpleParser) getSimpleParser)

The Code r ~ '[xs], All HasSimpleParser xs constraint means "this type has only one constructor, the list of field types is xs, and all the field types have HasSimpleParser instances".

hcpure constructs an n-ary product (NP) where each component is a parser for the corresponding field of r. (NP products wrap each component in a type constructor, which in our case is the parser type ReadP).

Then we use hsequence to turn a n-ary product of parsers into the parser of an n-ary product.

Finally, we fmap into the resulting parser and turn the n-ary product back into the original r record using to. The Z and SOP constructors are required for turning the n-ary product into the sum-of-products the to function expects.


Ok, let's define an example record and make it an instance of Generics.SOP.Generic:

data Foo = Foo { x :: Int, y :: Bool } deriving (Show, GHC.Generic)

instance Generic Foo -- Generic from generics-sop

Let's check if we can parse Foo with recParser:

main :: IO ()
main = do
    print $ readP_to_S (recParser @Foo) "55False"

The result is

[(Foo {x = 55, y = False},"")]
danidiaz
  • 26,936
  • 4
  • 45
  • 95
4

You can write your own parser - but there is already a package that can do the parsing for you: cassava and while SO is usually not a place to search for library recommendations, I want to include this answer for people looking for a solution, but not having the time to implement this themselves and looking for a solution that works out of the box.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Csv
import Data.Vector
import Data.ByteString.Lazy as B
import GHC.Generics

data Person = P { personId :: Int
                , firstName :: String
                , lastName :: String
                } deriving (Eq, Generic, Show)

 -- the following are provided by friendly neighborhood Generic
instance FromRecord Person
instance ToRecord Person

main :: IO ()
main = do B.writeFile "test" "1\NULThomas\NULof Aquin"
          Right thomas <- decodeWith (DecodeOptions 0) NoHeader <$> 
                              B.readFile "test"

          print (thomas :: Vector Person)

Basically cassava allows you to parse all X-separated structures into a Vector, provided you can write down a FromRecord instance (which needs a parseRecord :: Parser … function to work.

Side note on Generic until recently I thought - EVERYTHING - in haskell has a Generic instance, or can derive one. Well this is not the case I wanted to serialize some ThreadId to CSV/JSON and happened to find out unboxed types are not so easily "genericked"!

And before I forget it - when you speak of streaming and server and so on there is cassava-conduit that might be of help.

epsilonhalbe
  • 15,637
  • 5
  • 46
  • 74