1

I have data that looks like this:

data Test = Test {
  id :: Int,
  rating :: Maybe Float,
  amount :: Maybe Int,
  reviewHeader :: Maybe String,
  reviewDescription:: Maybe String
  }
  deriving (Typeable, Data, Eq, Show, GHC.Generic)

testList :: [Test]

I would like to consolidate testList into one Test. I know the id is the same for all the Tests. The way I want to merge the Tests is that if for one field they all have the same value or Nothing, be that value, else if there exist different values make that field Nothing. An example would be:

t1 :: Test
t1 = Test 1 (Just 1.1) (Just 2) Nothing (Just "t22")
    
t2 :: Test
t2 = Test 1 (Just 2.1) (Just 2) (Just "t1") (Just "t22")

t3 :: Test
t3 = Test 1 Nothing Nothing (Just "t2") (Just "t22")

t4 :: Test
t4 = Test 1 (Just 1.1) Nothing Nothing (Just "t22")

testList = [t1, t2, t3, t4]

output = function testList

-- output equals Test 1 Nothing (Just 2) Nothing (Just "t22")

I understand how i would do this with two values, but I need to perform it on a list. Also my real data has 20+ records and multiple versions so I would like to use Generics.

Alex Chen
  • 53
  • 4
  • I know you were looking for solution with generics, but still maybe you may do it without it? Take a look at https://gist.github.com/snwfx/45bb52a15132d2a2d9188ace79100596. Yes, it will require to create such `function` for every data type you have and it also requires to manually list all the fields in the implementation of each `function`. Though I guess you will need to write it once and change it only if you have changes in the data type declaration. –  Jul 17 '20 at 15:04
  • This solution would work but I wanted to avoid doing something like this just because the number of fields is very large for most of the data types I need to perform this on. Additionally I did not want to make a separate function for each data type. I may do something like this in a future project though, so thank you! – Alex Chen Jul 17 '20 at 16:13

1 Answers1

3

Header stuff:

{-# LANGUAGE DeriveGeneric, FlexibleContexts, FlexibleInstances, MultiParamTypeClasses, PolyKinds, TypeFamilies, TypeOperators #-}
import Data.Kind(Type)
import Data.Maybe(fromJust)
import Data.Semigroup(First(..))
import GHC.Generics((:*:)(..), Generic, K1(..), M1(..), Rep, from, to)
import GHC.Exts(Any)

Let's use this this thing from a previous answer:

data Same a = Vacuous | Fail | Same a
instance Eq a => Semigroup (Same a) where
    Vacuous    <> x       = x
    Fail       <> _       = Fail
    s@(Same l) <> Same r  = if l == r then s else Fail
    x          <> Vacuous = x
    _          <> Fail    = Fail
instance Eq a => Monoid (Same a) where
    mempty = Vacuous

We can inject Maybe into Same:

maybeSame :: Maybe a -> Same a
maybeSame = maybe Vacuous Same

and we can collapse the other way:

sameMaybe :: Same a -> Maybe a
sameMaybe (Same x) = Just x
sameMaybe _ = Nothing

Let's apply both to every field within a generic representation:

class Monoid (MaybeSameAllRep rep p) => GMaybeSameAll rep p where
    type MaybeSameAllRep rep :: k -> Type
    gMaybeSameAll :: rep p -> MaybeSameAllRep rep p
    gSameMaybeAll :: MaybeSameAllRep rep p -> rep p
type family ForBase (x :: Type) :: Type where
    ForBase (Maybe x) = Same x
    ForBase x = Maybe (First x)
instance {-# OVERLAPS #-} Eq a => GMaybeSameAll (K1 i (Maybe a)) p where
    gMaybeSameAll = K1 . maybeSame . unK1
    gSameMaybeAll = K1 . sameMaybe . unK1
instance ForBase c ~ Maybe (First c) => GMaybeSameAll (K1 i c) p where
    type MaybeSameAllRep (K1 i c) = K1 i (ForBase c)
    gMaybeSameAll = K1 . Just . First . unK1
    gSameMaybeAll = K1 . getFirst . fromJust . unK1
instance (GMaybeSameAll l p, GMaybeSameAll r p) => GMaybeSameAll (l :*: r) p where
    type MaybeSameAllRep (l :*: r) = MaybeSameAllRep l :*: MaybeSameAllRep r
    gMaybeSameAll (l :*: r) = gMaybeSameAll l :*: gMaybeSameAll r
    gSameMaybeAll (l :*: r) = gSameMaybeAll l :*: gSameMaybeAll r
instance (GMaybeSameAll r p) => GMaybeSameAll (M1 i c r) p where
    type MaybeSameAllRep (M1 i c r) = M1 i c (MaybeSameAllRep r)
    gMaybeSameAll = M1 . gMaybeSameAll . unM1
    gSameMaybeAll = M1 . gSameMaybeAll . unM1

And so, everything boils down to just converting and combining:

combine :: (Foldable f, Generic t, GMaybeSameAll (Rep t) Any) => f t -> t
combine = post . foldMap pre
    where post :: (Generic t, GMaybeSameAll (Rep t) Any) => MaybeSameAllRep (Rep t) Any -> t
          post = to . gSameMaybeAll
          pre :: (Generic t, GMaybeSameAll (Rep t) Any) => t -> MaybeSameAllRep (Rep t) Any
          pre = gMaybeSameAll . from

A lot of this is rather ugly; does anyone have any better ideas?

output = combine testList
-- = Test 1 Nothing (Just 2) Nothing (Just "t22"), as desired
HTNW
  • 27,182
  • 1
  • 32
  • 60
  • I seem to be getting an error on `gMaybeSameAll = K1 . Just . First . unK1` and `gSameMaybeAll = K1 . getFirst . fromJust . unK1` saying it cannot construct the infinite type c ~Maybe c in ForBase. I 'm fairly new to Haskell so I'm having issues resolving this based on other posts I have seen. – Alex Chen Jul 14 '20 at 20:25
  • @AlexChen There are two `First`s. I meant `Data.Semigroup.First`. My fault not for specifying imports. Editing... – HTNW Jul 14 '20 at 20:33
  • Thank you! Do you have any tips for how to get comfortable with Generics? I have been thrown a bit headfirst into Generics and would like to somehow understand it all a bit better beyond just reading the docs. – Alex Chen Jul 14 '20 at 20:34
  • 1
    @AlexChen I'll be real here: I wrote this basically just by modifying the `Encode` example in the docs. That's the only `Generic` specific part. Everything else is just being comfortable with using the type-system as another programming language. (Thankfully, type-level Haskell is more similar to Haskell than e.g. type-level C++ is to C++.) A lot of my previous Haskell answers are in this vein, perhaps they'd help. A short list of tricks: types are compile-time data structures; type families are functions between such structures; type classes are functions from types to values. – HTNW Jul 14 '20 at 20:39
  • @HTNW: And multiparameter type classes are relations amongst types, so a bit of logic programming knowledge comes in handy :) – Jon Purdy Jul 14 '20 at 23:12
  • @HTNW I have realized that I am in a Safe Haskell environment. Is there an alternative to using GHC.Exts (Any) that is safe? – Alex Chen Jul 16 '20 at 16:40
  • 1
    @AlexChen Yes: just use any type. Literally, whatever type you want, just replace `Any` with it. It should work. In order to decrease the chances of someone messing with your code, create a dummy type in your module (`data Whatever = Whatever`) and just don't export it. Or, just define `Any` yourself: it is simply `type family Any :: k where` (the `where` and the absence of anything under it are essential; it's an undefined closed family). The second parameter to `Rep` is phantom, so it doesn't do anything. Honestly, I have no idea what it's for... Maybe it's something `Generic1` related... – HTNW Jul 16 '20 at 16:45