How can a monadic/sequential migration be implemented for data in acid-state?

Question

Current state

I have two data types.

data Foo = Foo
  {  fooId :: RecordId Foo
   , bars  :: [RecordId Bar]
   ...
  }

data Bar = Bar 
  {  barId :: RecordId  Bar
  ...
  }

This schema allows for each Foo to refer to an arbitrary list of Bars. Clearly, Bars can be shared among any number of Foos, or no Foos.

I already have data persisted in acid-state that uses this type of schema structure.

Desired state

data Foo = Foo
  {  fooId :: RecordId Foo
   ...
  }

data Bar = Bar 
  {  barId :: RecordId  Bar
   , fooId :: RecordId Foo
  ...
  }

In the desired state, each Bar must have exactly one Foo, as in common many-to-one SQL foreign key relationships.

The Problem

Now of course, there is no way to perfectly transition between these two states, as the latter is less expressive than the former. However, I can write code that deals with any ambiguity here (for duplicate references, prefer the Foo with the smallest fooId, and simply delete any Bars that are not referenced by a Foo).

My issue is I cannot see any path, using Safecopy, to migrate between these two schemas. As far as I can tell, Safecopy defines migrations as pure functions between types and I cannot query the state of acid-state inside a migrate function. What I need here, though, is a migration that runs once, on the state at a specific point in time, and converts one schema into the other. With a database this would be trivial, but with acid-state I just can't see my way forward.

The only inkling towards a solution that I have is to have a separate program (or, say, command line feature callable from the main program) compiled specifically to run the few lines of code necessary to handle the data migration (so, say, all Foov0, Barv0 are converted to Foov1,Barv1) and then simply swap in the new schema in my main program.

However, I don't even see how this could work. In my understanding of safecopy, if I defined migrations to the new schema in the normal way then as soon as I try to access the data I will be given an instance of the new data type, which of course does not contain the data I need to actually migrate the data.

One (clumsy, it seems to me) option might be to define two further data types, copy the data across to them, then change the schema and run a migration that copies data back across to the new schema, then remove the further data types. Which requires three compilations of the program to run on the data sequentially, which somehow does not seem very elegant!

Any pointers would be greatly appreciated.

Edit: Possible Solution

I neglected to mention that the schema above is wrapped in a data type that represents the entire state of the program, like

data DB = DB {
  dbFoos :: [Foo],
  dbBars :: [Bar]
}

I think this means that all I need to do is to define a new data DB and write a migration from DBv0 to DB, handling my data there without any need for sequencing or monadic activity. I will experiment with this and post this as an answer if successful.

You can possibly get away with a "many-to-many join table" system `C: (Id C, Id A, Id B)` which sits between them. Then you can maybe "weed out" the C's which share B's transactionally, rather than having to put all of that logic inside the migration. But I haven't used Data.SafeCopy and Data.Acid so I cannot answer you in depth. — CR Drost, Mar 12 '15 at 20:09
Many thanks for the input. Something along these lines might work in the sense that I could fill up the join table with all references to Bar in Foo, but I think that is would have the same issue that, within safe-copy, the migrate function from Foov0 to Foov1 would effectively mask the original data, so I would still be left with the issue that I would need to run one computation with the old schema to fill up the join table, and then recompile with the new schema and proceed. — matchwood, Mar 12 '15 at 20:49

score 1 · Accepted Answer · answered Mar 13 '15 at 15:47

In my particular circumstance, because the state was wrapped by a single DB type, the solution was to write a migration for the top level type. The migrate instance therefore had access to all of the data, so could run the necessary logic to complete the migration. So the solution looks something like this:

data DB = DB {
  dbFoos :: [Foo],
  dbBars :: [Bar]
}

data DB_v0 = DB_v0 {
  v0_dbFoos :: [Foo_v0],
  v0_dbBars :: [Bar_v0]
}

data Foo = Foo
  {  fooId :: RecordId Foo
   ...
  }

data Bar = Bar 
  {  barId :: RecordId  Bar
   , fooId :: RecordId Foo
  ...
  }
data Foo_v0 = Foo_v0
  {  v0_fooId :: RecordId Foo
   , v0_bars  :: [RecordId Bar]
   ...
  }

data Bar_v0 = Bar_v0 
  {  v0_barId :: RecordId  Bar
  ...
  }

instance Migrate DB where
  type MigrateFrom DB = DB_v0
  migrate dbV0 = DB {
    dbFoos = migrateOldFoos
   ,dbBars = migrateOldBars
  }
 where 
  migrateOldFoos :: [Foo]
  -- (access to all old data possible here)
  migrateOldBars :: [Bar]
  -- (access to all old data possible here)

With relevant instances of migrate for Foo_v0 to Foo and Bar_v0 to Bar. One potential gotcha is that the definition of DB_v0 has to reference Foo_v0 and Bar_v0, otherwise SafeCopy would automatically migrate them to Foos and Bars, which would mean that the data was already gone before you were able to use it in the Migrate DB class.

SafeCopy = awesome

How can a monadic/sequential migration be implemented for data in acid-state?

Current state

Desired state

The Problem

Edit: Possible Solution

1 Answers1