18

The code below uses an unsafe GeneralizedNewtypeDeriving extension to break Data.Set by inserting different elements with different Ord instances:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Data.Set
import System.Random

class AlaInt i where
  fromIntSet :: Set Integer -> Set i
  toIntSet :: Set i -> Set Integer
instance AlaInt Integer where
  fromIntSet = id
  toIntSet = id
newtype I = I Integer deriving (Eq, Show, AlaInt)
instance Ord I where compare (I n1) (I n2) = compare n2 n1 -- sic!  

insert' :: Integer -> Set Integer -> Set Integer
insert' n s = toIntSet $ insert (I n) $ fromIntSet s

randomInput = take 5000 $ zip (randomRs (0,9) gen) (randoms gen) where
    gen = mkStdGen 911

createSet = Prelude.foldr f empty where
    f (e,True) = insert e
    f (e,False) = insert' e

main = print $ toAscList $ createSet randomInput

The code prints [1,3,5,7,8,6,9,6,4,2,0,9]. Note that the list is unordered and has 9 twice.

Is it possible to perform this dictionary swapping attack using other extensions, e.g. ConstraintKinds? If yes, can Data.Set be redesigned to be resilient to such attacks?

nponeccop
  • 13,527
  • 1
  • 44
  • 106
  • possible duplicate of [Is there a list of GHC extensions that are considered 'safe'?](http://stackoverflow.com/questions/10830757/is-there-a-list-of-ghc-extensions-that-are-considered-safe) – Daniel Wagner Oct 04 '12 at 21:17

1 Answers1

21

I think that's an important question, so I'll repeat my answer from elsewhere: you can have multiple instances of the same class for the same type in Haskell98 without any extensions at all:

$ cat A.hs
module A where
data U = X | Y deriving (Eq, Show)

$ cat B.hs
module B where
import Data.Set
import A
instance Ord U where
    compare X X = EQ
    compare X Y = LT
    compare Y X = GT
    compare Y Y = EQ
ins :: U -> Set U -> Set U
ins = insert

$ cat C.hs
module C where
import Data.Set
import A
instance Ord U where
    compare X X = EQ
    compare X Y = GT
    compare Y X = LT
    compare Y Y = EQ
ins' :: U -> Set U -> Set U
ins' = insert

$ cat D.hs
module D where
import Data.Set
import A
import B
import C
test = ins' X $ ins X $ ins Y $ empty

$ ghci D.hs
Prelude D> test
fromList [X,Y,X]

And yes, you can prevent this kind of attacks by storing the dictionary internally:

data MSet a where MSet :: Ord a => Set a -> MSet a
MigMit
  • 1,698
  • 12
  • 14
  • 1
    `test = (ins X . ins' X . ins' Y) empty` reproduces the problem – nponeccop Oct 05 '12 at 11:07
  • 1
    I would think that storing the dictionary internally prevents you from implementing efficient union and intersection operations, though. – C. A. McCann Oct 05 '12 at 13:03
  • 2
    Could be. Using just one of the dictionaries supplied is as bad as using an external one. – MigMit Oct 05 '12 at 13:24
  • 5
    A slight correction. These modules do not form a valid Haskell 98 program since they define conflicting instances for Ord U. However, GHC accepts them anyways since GHC does not do a global instance uniqueness check: it only does a check when it needs to solve a constraint, which happens in modules B and C, in each of which only one instance is visible. This is a known (and long-standing) deficiency of GHC. (What is not known is how many Haskell programs out there rely on this lack of a global uniqueness check...) – Reid Barton Jan 04 '15 at 16:33
  • 2
    That's true. However, I don't think the problem of verifying global uniqueness can be solved in general — I mean, when you have some extensions enabled, such as `OverlappingInstances` for example. And GHC supports tons of extensions. – MigMit Jan 07 '15 at 10:18