3

I'm trying to understand how to use Shake and how to build new rules. As an exercise, I've decided to implement what I call a backup rule.

The idea is to generate a file if it doesn't exists OR if it's too old (let's more than 24 hour). I like to store long command in a makefile and run them on demand. An example is a mysql backup. The only problem is when the backup already exists, make doesn't do anything. To solve this, I can either

  • remove the previous backup before redoing a new one,
  • make the backup target phony
  • add a fictive force dependency, which I can touch manually or in cron.

What I would like, is to redo the backup if it's older than 24 hours (which I can do with a touch force in cron). Anyway it's only an example to play with Shake. What I would like is something like :

expirable "my_backup" 24 \out -> do
    cmd "mysqldump" backup_parameter out

I read the doc, but I have no idea how to do this or define a rule and what an Action is. I understand that I need to instanciate a Rule class but I can't figure out what is what.

Clarification

I don't want the backup to be run automatically but to be run only on demand but with a maximum of once per 24 hour.

An example scenario is I have a production database on a remote machine, local copy and run some time consuming reports locally. The normal workflow is

  • download production backup
  • refresh the local database with it
  • create some denormalized tables on a local warehouse database
  • generate some report.

I don't run the report every days but only when I need it. So I don't want to run report every 24 hours. It's easy to do with a makefile except the timing bit, they are wor around but once again it's a contrived example to understand deeply how Shake work.

So, when I first do make report it backup the db run everything and generate the report. Now, I want to modify the report (because I'm testing it). I don't need the backup to regenerated (nor the local database to refreshed) (we are the evening, and I know that nothing has changed on production until the next day)

Then the next day, or next month, I rerun the report. This time I need the backup to be done again, and all it's dependency to be rerun as well.

Basically the rule I need is instead of

redo timestamp = timestamp < old

is

redo timestamp = timestamp < old || now > timestamp + 24*36000

But I have no idea where to put this rule.

The question is more where to puth it, instead of how to write it (it's above). If it's easier (to explain) I can have a rule which ask the user (getLine) 'do you want to redo the this target (yes/no)?`.

Later I will also need a rule depending on the last update of the database (or a specific table). I know how to get the information from the database but not how to integrate it in Shake.

I might be confused with what a Rule is. In make a rule is about how to make a target (so it's more a recipe) or what I think is the Action in Shake. Where is, when I say rule, I mean the rule which decide to remake the target or not, not how to do it. In make, you don't have the choice (it's timestamp) so there is no such concept.

mb14
  • 22,276
  • 7
  • 60
  • 102
  • An action is anything you can do in `IO` (so basically a program), plus opt-in dependency tracking through the `need` function (and others) in http://hackage.haskell.org/package/shake-0.13.2/docs/Development-Shake.html – Boyd Stephen Smith Jr. Jul 22 '14 at 22:20
  • Do you want to rerun the backup if it hasn't run in 24 hours, or if it hasn't run _because of the timer_ in 24 hours. e.g. If you run at midnight due to the timer, run at midday due to a change, do you still want to run again at midnight due to the timer? – Neil Mitchell Jul 23 '14 at 10:43
  • @neil: I updated the question. However I'm not bother about the real *rule algorithm* anything could do, a random choice, ask the user etc. I just need to now how to integrate my own `does this need to be rebuild` code. – mb14 Jul 23 '14 at 11:13
  • To clarify, `Action` is basically `IO` + calling `need`, `Rules` are a set of things you define with `*>`, `Rule` is a type class that is used to define rules such as `*>`. Most people don't define new rules, but it is necessary if you want to integrate your own "does this need to rebuild" code. – Neil Mitchell Jul 23 '14 at 20:31
  • @mb14 I have deliberately answered only the `backup` rule part. The database one would be easier, but I suggest you break that into a separate question if it isn't obvious from the answer below. – Neil Mitchell Jul 24 '14 at 16:21

2 Answers2

1

Here's a solution that partially works:

import Development.Shake
import Control.Monad
import System.Directory as IO
import Data.Time

buildBackupAt :: FilePath -> Action ()
buildBackupAt out = cmd "mysqldump" "-backup" out {- Or whatever -}

-- Argument order chosen for partial application
buildEvery :: NominalDiffTime -> (FilePath -> Action ()) -> FilePath -> Action ()
buildEvery secs act file = do
    alwaysRerun
    exists <- liftIO $ IO.doesFileExist file
    rebuild <- if not exists then return True else do
        mtime <- liftIO $ getModificationTime file
        now <- liftIO $ getCurrentTime
        return $ diffUTCTime now mtime > secs
    when rebuild $ act file

myRules :: Rules ()
myRules = "my_backup" *> buildEvery (24*60*60) buildBackupAt
-- File name is a FilePattern that shake turns into a FilePath; no wildcard here,
-- so it's simple, but you can wildcard, too as long as you action pays attention
-- to the FilePath passed in.

This will rebuild the backup every day, but will not rebuild if the dependencies declared in buildBackupAt change.

Neil Mitchell
  • 9,090
  • 1
  • 27
  • 85
  • 1
    I don't think this will work because Shake will skip running the `my_backup` rule since its dependencies will not have changed. If you add `alwaysRerun` it should work though, as then your custom logic will always get called. – Neil Mitchell Jul 23 '14 at 08:17
  • It doesn't work indeed. Once the file is done, the rule is never checked again. – mb14 Jul 23 '14 at 08:46
1

There are two senses of "writing rules" in Shake: 1) using *> or similar to define the rules specific to your build system; 2) defining new types of rules, e.g. defining operators such as *> yourself. Most users of Shake do 1 a lot, and never do 2. Your question seems entirely concerned with 2, which is certainly possible (all rules are written outside the core of Shake) but rarer.

To define something which runs while checking the build, you need to use the Development.Shake.Rule module, and define an instance of the type class Rule. You typically want to sugar up the apply1 function so people can use your rule in a type-safe way. If you are writing a simple rule (e.g. look up a modification date, see if it's changed) then it isn't too hard. If you are doing a more complex rule (e.g. check a file is no more than 1 day old) it's a bit tricker, but still possible - it needs more care thinking about what gets stored where. Taking your "rebuild if file is older than some number of seconds" example, we can define:

module MaximumAgeRule(maximumAge, includeMaximumAge) where

import Data.Maybe
import Development.Shake.Rule
import Development.Shake.Classes
import Development.Shake
import System.Directory as IO
import Data.Time

newtype MaxAgeQ = MaxAgeQ (FilePath, Double)
    deriving (Show,Binary,NFData,Hashable,Typeable,Eq)

instance Rule MaxAgeQ Double where
    storedValue _ (MaxAgeQ (file, secs)) = do
        exists <- IO.doesFileExist file
        if not exists then return Nothing else do
            mtime <- getModificationTime file
            now <- getCurrentTime
            return $ Just $ fromRational (toRational $ diffUTCTime now mtime)
    equalValue _ (MaxAgeQ (_, t)) old new = if new < t then EqualCheap else NotEqual

-- | Define that the file must be no more than N seconds old
maximumAge :: FilePath -> Double -> Action ()
maximumAge file secs = do
    apply1 $ MaxAgeQ (file, secs) :: Action Double
    return ()

includeMaximumAge :: Rules ()
includeMaximumAge = do
    rule $ \q@(MaxAgeQ (_, secs)) -> Just $ do
        opts <- getShakeOptions
        liftIO $ fmap (fromMaybe $ secs + 1) $ storedValue opts q

We can then use the rule with:

import Development.Shake
import MaximumAgeRule

main = shakeArgs shakeOptions $ do
    includeMaximumAge
    want ["output.txt"]
    "output.txt" *> \out -> do
        maximumAge out (24*60*60)
        liftIO $ putStrLn "rerunning"
        copyFile' "input.txt" "output.txt"

Now the file input.txt will be copied to output.txt every time it changes. In addition, if output.txt is more than one day old, it will be copied afresh.

How the usage works Since we are using a custom rule, we have to declare that with includeMaximumAge (which is ugly, but unavoidable). We then call maximumAge when producing output.txt, saying that the file output.txt must be no more than 1 day old. If it is, the rule reruns. Simple and reusable.

How the definition works The definition is a bit complex, but I don't expect many people to define rules, so a StackOverflow question per rule definition seems reasonable :). We have to define a key and a value for the rule, where the key produces the value. For the key we declare a fresh type (as you always should for keys) which stores the filename and how old it is allowed to be. For the value, we store how old the file is. The storedValue function retrieves the value from the key by querying the file. The equalValue function looks at the value and decides if the value is EqualCheap (don't rebuild) or NotEqual (do rebuild). Normally equalValue does old == new as its main test, but here we don't care what the value was last time (we ignore old), but we do care what the threshold in MaxAgeQ is, and we compare it to the value.

The maximumAge function just invokes apply1 to add a dependency on MaxAgeQ, and includeMaximumAge defines what apply1 calls.

Neil Mitchell
  • 9,090
  • 1
  • 27
  • 85
  • That pretty much work, but if touch `output.txt` (before the 24 hours) to trigger a rule depending on it. It also rebuilts `output.txt`. I was expecting `storedValue` to return the the last modified time and the `equalValue` to check against the current. I do realize now, you can't use `now` outside an `IO`. I'll try to modify your code to what I need (although I as said, it's more an exercise than a real need). Apart from that, everything seems really mysterious but question about it are probably out of the scope of this question . – mb14 Jul 24 '14 at 20:39
  • If you modify `output.txt` then it rebuilds anyway. The rule in Shake is that files are considered dirty if they have changed since Shake ran them, it isn't just timestamp comparison (unlike Make). – Neil Mitchell Jul 25 '14 at 09:10
  • Seems weird to have to rebuild a file to trigger the rebuild of it's dependees. If that's the case Shake might missing a concept (the difference between *new* and *dirty*). – mb14 Jul 25 '14 at 10:23
  • `output.txt` is considered dirty if you modify `output.txt` or anything it depends on (i.e. `input.txt` or wait 1 day) then it rebuilds. It's dependencies rebuild automatically based on when they need to rebuild. – Neil Mitchell Jul 25 '14 at 10:44
  • I understand, but if `output2.txt` depends on `output.txt`, you should be able to touch `output.txt` to rebuild `output2.txt` without having to rebuild `output.txt`. I might be nick picking but this is easy-ish with a normal make (with the `touch force` in cron every 24hours). I thought this should be straight forward with Shake, but it appears to be much more complicated than I thought. I know most users don't write their own rules, If I'm switching to `Shake` it's because I (in theory) can write my own rules. Otherwise, I'm happy with `make`. – mb14 Jul 25 '14 at 11:23
  • This has nothing to do with custom rules. Using totally normal rules, if you touch an output, it gets rebuilt. That is deliberate. If you want to define your own file rules which operate like Make, and only change if the output is older, you can do that too - but it's risky, as the output might no longer be correct. You can always touch `output2.txt` to force it to rebuild, rather than `output.txt`. – Neil Mitchell Jul 25 '14 at 12:18
  • I thought with normal rules, Shake will behave as Make. Touching `output2.txt` is not an option. Often you touch `output.txt` and rebuild `output10.txt` not knowing that it needs `output[2..9].txt` and they all depend on `output.txt`. Touching those 8 files is unacceptable (that totally dismiss the point of using a build system) and rebuild `output.txt` is unacceptable too. `output.tx` might be the result of 4 hours batch, which anyway doesn't need to be updated. – mb14 Jul 25 '14 at 12:44
  • It seems this comment section has got a bit long. Perhaps you could synthesise the problem you are trying to solve (ignoring the rebuild every 24 hours, which this thread does solve) and we can discuss it separately. – Neil Mitchell Jul 25 '14 at 12:56