11

First up, a simplified version of the task I want to accomplish: I have several large files (amounting to 30GB) that I want to prune for duplicate entries. To this end, I establish a database of hashes of the data, and open the files one-by-one, hashing each item, and recording it in the database and the output file iff its hash wasn't already in the database.

I know how to do this with iteratees, enumerators, and I wanted to try conduits. I also know how to do it with conduits, but now I want to use conduits & persistent. I'm having problems with the types, and possibly with the entire concept of ResourceT.

Here's some pseudo code to illustrate the problem:

withSqlConn "foo.db" $ runSqlConn $ runResourceT $ 
     sourceFile "in" $= parseBytes $= dbAction $= serialize $$ sinkFile "out"

The problem lies in the dbAction function. I would like to access the database here, naturally. Since the action it does is basically just a filter, I first thought to write it like that:

dbAction = CL.mapMaybeM p
     where p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => DataType -> m (Maybe DataType)
           p = lift $ putStrLn "foo" -- fine
           insert $ undefined -- type error!
           return undefined

The specific error I get is:

Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
  bound by the type signature for
             p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
                           DataType -> m (Maybe DataType)
  at tools/clean-wac.hs:(33,1)-(34,34)
  `m' is a rigid type variable bound by
      the type signature for
        p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
                      DataType -> m (Maybe (DataType))
      at tools/clean-wac.hs:33:1
Expected type: m (Key b0 val0)
  Actual type: b0 m0 (Key b0 val0)

Note that this might be due to wrong assumptions I made in designing the type signature. If I comment out the type signature and also remove the lift statement, the error message turns into:

No instance for (PersistStore ResourceT (SqlPersist IO))
  arising from a use of `p'
Possible fix:
  add an instance declaration for
  (PersistStore ResourceT (SqlPersist IO))
In the first argument of `CL.mapMaybeM', namely `p'

So this means that we can't access the PersistStore at all via ResourceT?

I cannot write my own Conduit either, without using CL.mapMaybeM:

dbAction = filterP
filterP :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => Conduit DataType m DataType
filterP = loop
    where loop = awaitE >>= either return go
          go s = do lift $ insert $ undefined -- again, type error
                    loop

This resulted in yet another type error I don't fully understand.

Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
  bound by the type signature for
             filterP :: (MonadIO m,
                                 MonadBaseControl IO (SqlPersist m)) =>
                                Conduit DataType m DataType
     `m' is a rigid type variable bound by
      the type signature for
        filterP :: (MonadIO m,
                            MonadBaseControl IO (SqlPersist m)) =>
                           Conduit DataType m DataType
Expected type: Conduit DataType m DataType
  Actual type: Pipe
                 DataType DataType DataType () (b0 m0) ()
In the expression: loop
In an equation for `filterP'

So, my question is: is it possible to use persistent like I intended to inside a conduit at all? And if, how? I am aware that since I can use liftIO inside the conduit, I could just go and use, say HDBC, but I wanted to use persistent explicitly in order to understand how it works, and because I like its db-backend agnosticism.

sclv
  • 38,665
  • 7
  • 99
  • 204
Aleksandar Dimitrov
  • 9,275
  • 3
  • 41
  • 48
  • Have you tried using `lift` instead of `liftIO`? – Michael Snoyman Nov 11 '12 at 17:36
  • Ah, yes, sure `liftIO` imposes a constraint on the entire `do` block. But that only explains why the first error message differs from the second. I'll update the post in a sec, to reflect what'll happen if you remove the liftIO statement. – Aleksandar Dimitrov Nov 11 '12 at 20:03
  • BTW, even `lift` already imposes `IO` restrictions on the monad type. I noted you have to *remove* the `lift` statement altogether to reach that error message. If you don't (but keep `lift $ print ""` in) you instead get `Couldn't match expected type 'SqlPersist m0 a0' with actual type 'IO ()'`. – Aleksandar Dimitrov Nov 11 '12 at 21:11
  • Well, one issue above is `filterP :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => Conduit DataType m DataType`. What you probably want is `Conduit DataType (SqlPersist m) DataTpe`. I think that might clear up a fair amount of the problems. – Michael Snoyman Nov 12 '12 at 05:11
  • But that can't possibly work, can it? The `Conduit` is run by `runResourceT` which requires its argument to be instantiated to at least `ResourceT m`, not `SqlPersist m`. It also imposes on `m` the constraint `MonadBaseControl IO m`, so that *has* to be in the conduit's type signature. – Aleksandar Dimitrov Nov 12 '12 at 10:51
  • 1
    @AleksandarDimitrov The MonadBaseControl type class is in transformers-base-0.3 and seems to have disappeared in version 0.4.1 which is the current. I'm currently working on a variation of this same problem. – Erik de Castro Lopo Nov 13 '12 at 03:54
  • @ErikdeCastroLopo, if you find a way to solve the issue, I'd be very grateful for an answer. I might also ask haskell-cafe, soon; but I'm up to my ears in work, so I went back to Iteratees (it's what I know best.) I'll play around with this on the weekend again. – Aleksandar Dimitrov Nov 13 '12 at 15:55

1 Answers1

7

The code below compiles fine for me. Is it possible that the frameworks have moved on inthe meantime and things now just work?

However note the following changes I had to make as the world has changed a bit or I didn't have all your code. I used conduit-1.0.9.3 and persistent-1.3.0 with GHC 7.6.3.

  • Omitted parseBytes and serialise as I don't have your definitions and defined DataType = ByteString instead.

  • Introduced a Proxy parameter and an explicit type signature for the undefined value to avoid problems with type family injectivity. These likely don't arise in your real code because it will have a concrete or externally determined type for val.

  • Used await rather than awaitE and just used () as the type to substitute for the Left case, as awaitE has been retired.

  • Passed a dummy Connection creation function to withSqlConn - perhaps I should have used some Sqlite specific function?

Here's the code:

{-# LANGUAGE FlexibleContexts, NoMonomorphismRestriction,
             TypeFamilies, ScopedTypeVariables #-}

module So133331988 where

import Control.Monad.Trans
import Database.Persist.Sql
import Data.ByteString
import Data.Conduit
import Data.Conduit.Binary
import Data.Proxy

test proxy =
    withSqlConn (return (undefined "foo.db")) $ runSqlConn $ runResourceT $ 
         sourceFile "in" $= dbAction proxy $$ sinkFile "out"

dbAction = filterP

type DataType = ByteString

filterP
    :: forall m val
     . ( MonadIO m, MonadBaseControl IO (SqlPersist m)
       , PersistStore m, PersistEntity val
       , PersistEntityBackend val ~ PersistMonadBackend m)
    => Proxy val
    -> Conduit DataType m DataType
filterP Proxy = loop
    where loop = await >>= maybe (return ()) go
          go s = do lift $ insert (undefined :: val)
                    loop
Ganesh Sittampalam
  • 28,821
  • 4
  • 79
  • 98
  • I've asked this so long ago that I barely remembered what this was about. But I think this should clear it up. Yes, I think the APIs in question just changed quite since I asked that question. Thanks! – Aleksandar Dimitrov Jan 03 '14 at 13:48
  • 1
    I was actually a bit disappointed when it just worked as I was hoping for a juicy type system problem to think about :-) – Ganesh Sittampalam Jan 03 '14 at 14:05