9

When building my Haskell project locally using stack build, only the changed source files are re-compiled. Unfortunately, I am not able to make Stack behave like this on GitHub Actions. Any suggestions please?

Example

I created a simple example with Lib.hs and Fib.hs, I even check that cached .stack-work folder is updated between builds but it always compiles both files even when just one is changed.

Here is the example:

  1. (no cache used, builds both Lib.hs and Fib.hs + dependencies): https://github.com/MarekSuchanek/stack-test/runs/542163994
  2. (only Lib.hs changes, builds both Lib.hs and Fib.hs): https://github.com/MarekSuchanek/stack-test/runs/542174351

I can observe from logs (verbose Stack) that something in cache is being updated, but it is totally not clear to me what and why. It correctly finds out that only Lib.hs is changed: "stack-test-0.1.0.0: unregistering (local file changes: src/Lib.hs)" so I can't understand why all gets compiled. I noticed that in 2. Fib.hi is not updated in .stack-work but others (Fib.o, Fib.dyn_hi, and Fib.dyn_o) are.

Note

Caching of ~/.stack is OK as well as no-build when no source file is changed. Of course, this is dummy example, but we have different projects with many more source files where it would significantly speed up the build. When non-source file is changed (e.g. README file), nothing is being built as expected.

sjakobi
  • 3,546
  • 1
  • 25
  • 43
Marek Suchánek
  • 118
  • 1
  • 6

2 Answers2

6

The culprit for this problem is that stack uses timestamp (as many other tools do) to figure out if a source file has changed or not. When you restore cache on CI and you do it correctly, none of the dependencies will get rebuild, but the problem the source files is that when the CI provider clones a repo for you, the timestamps for all of the files in the repo are set to the date and time when it was cloned.

Hopefully the cause for recompilation of unchanged source files makes sense now. What do we do about working around this problem. The only real way to get it is to restore the timestamp of the last git commit that changed a particular file. I noticed this quite a while ago and a bit of googling gave me some answers on SO, here is one of them I think: Restore a file's modification time in Git

A modified it a bit to suite my needs and that is what I ended up with:

  git ls-tree -r --name-only HEAD | while read filename; do
    TS="$(git log -1 --format="%ct" -- ${filename})"
    touch "${filename}" -mt "$(date --date="@$TS" "+%Y%m%d%H%M.%S")"
  done

That worker great for a while for me on Ubuntu CI, but solving this problem in an OS agnostic manner with bash is not something I wanted to do when I needed to setup Azure CI. For that reason I wrote a Haskell script that works for all GHC-8.2 version and newer without requiring any non-core dependencies. I use it for all of my projects and I'll embed the juice of it here, but also provide a link to a permanent gist:

main = do
  args <- getArgs
  let rev = case args of
        [] -> "HEAD"
        (x:_) -> x
  fs <- readProcess "git" ["ls-tree", "-r", "-t", "--full-name", "--name-only", rev] ""
  let iso8601 = iso8601DateFormat (Just "%H:%M:%S%z")
      restoreFileModtime fp = do
        modTimeStr <- readProcess "git" ["log", "--pretty=format:%cI", "-1", rev, "--", fp] ""
        modTime <- parseTimeM True defaultTimeLocale iso8601 modTimeStr
        setModificationTime fp modTime
        putStrLn $ "[" ++ modTimeStr ++ "] " ++ fp
  putStrLn "Restoring modification time for all these files:"
  mapM_ restoreFileModtime $ lines fs

How would you go about using it without much overhead. The trick is to:

  • use stack itself to run the script
  • use the exactly samel resolver as the one for the project.

Above two points will ensure that no redundant dependencies or ghc versions will get installed. All in all the only two things are needed are stack and something like curl or wget and it will work cross platform:

# Script for restoring source files modification time from commit to avoid recompilation.
curl -sSkL https://gist.githubusercontent.com/lehins/fd36a8cc8bf853173437b17f6b6426ad/raw/4702d0252731ad8b21317375e917124c590819ce/git-modtime.hs -o git-modtime.hs
# Restore mod time and setup ghc, if it wasn't restored from cache
stack script --resolver ${RESOLVER} git-modtime.hs --package base --package time --package directory --package process

Here is a real project that uses this approach and you can dig through it to see how it works: massiv-io

Edit @Simon Michael in the comments mentioned that he can't reproduce this issue locally. Reason for this is that not everything is the same up on CI as it is locally. Quite often an absolute path is different, for example, possibly other things that I can't think of right now. Those things, together with the source file timestamp cause the recompilation of the source files.

For example follow this steps and you will find your project will be recompiled:

~/tmp$ git clone git@github.com:fpco/safe-decimal.git
~/tmp$ cd safe-decimal
~/tmp/safe-decimal$ stack build
safe-decimal> configure (lib)
[1 of 2] Compiling Main
...
Configuring safe-decimal-0.2.0.0...
safe-decimal> build (lib)
Preprocessing library for safe-decimal-0.2.0.0..
Building library for safe-decimal-0.2.0.0..
[1 of 3] Compiling Numeric.Decimal.BoundedArithmetic
[2 of 3] Compiling Numeric.Decimal.Internal
[3 of 3] Compiling Numeric.Decimal
...
~/tmp/safe-decimal$ cd ../
~/tmp$ mv safe-decimal safe-decimal-moved
~/tmp$ cd safe-decimal-moved/
~/tmp/safe-decimal-moved$ stack build
safe-decimal-0.2.0.0: unregistering (old configure information not found)
safe-decimal> configure (lib)
[1 of 2] Compiling Main
...

You'll see that the location of the project triggered project building. Despite that the project itself was rebuild, you will notice that none of the source files were recompiled. Now if you combine that procedure with a touch of a source file, that source file will get recompiled.

To sum it up:

  • Environment can cause the project to be rebuild
  • Contents of a source file can cause the source file (and others that depend on it) to be recompiled
  • Environment together with the source file contents or timestamp change can cause the project together with that source file to be recompiled
lehins
  • 9,642
  • 2
  • 35
  • 49
  • I'm confused by this, because I don't seem to see timestamp affecting my local stack builds. Eg if I `touch` a source file, it's not rebuilt. – Simon Michael Apr 13 '20 at 18:56
  • Likewise if I touch the .{dyn_hi,dyn_o,hi,o} files. – Simon Michael Apr 13 '20 at 20:16
  • 1
    @SimonMichael I added an example to the answer. In short, you need to trigger the rebuild of a project in order for the timestamp to trigger recompilation. – lehins Apr 13 '20 at 20:56
  • Thank you for the detailed info, very helpful. I saw it, as you say: changed paths (eg from renaming the folder) causes a rebuild of (a) Setup.hs and (b) any other modules whose timestamp has changed. Do you know of any issue for this in https://github.com/commercialhaskell/stack/issues ? – Simon Michael Apr 13 '20 at 21:54
  • PS I've seen some unexplained rebuilds in my github actions jobs too. I'm not seeing which path would be different - CWD seems to be /home/runner/work/PROJ/PROJ always - but perhaps there is one.. – Simon Michael Apr 13 '20 at 21:59
  • Maybe: https://github.com/commercialhaskell/stack/issues/5125 – Simon Michael Apr 13 '20 at 22:10
  • No, I am not aware of any issues related to this behavior. To be honest, I don't know what else can cause a stack rebuild besides the path change, but I know for sure that rebuilds happen on CI even when the path doesn't change. Doesn't really pose a problem for me since the solution I provided works for me pretty good :) – lehins Apr 13 '20 at 22:10
  • 1
    Thanks! Timestamps really solved this but additionally, GitHub actions use by default only very limited fetch without any history, so it had to be adjusted to [fetch all history](https://github.com/actions/checkout#fetch-all-history-for-all-tags-and-branches) in order to recover timestamps correctly. – Marek Suchánek Apr 14 '20 at 06:37
  • Reposting from the [related reddit thread](https://www.reddit.com/r/haskell/comments/g00ldn/haskell_stack_on_github_actions): [Here's](https://github.com/simonmichael/hledger/commit/6057070cfd6deb16f65f625c4c6a7c9ee32bf9f4) an example of the proposed fix, including both parts. It's not entirely working for me. Eg, with no modules changed, previously it would recompile 49 of 50 modules, now it recompiles just 10 of 50 (the same ones each time: 22, 40-47, 50). – Simon Michael Apr 21 '20 at 01:54
  • @SimonMichael It doesn't look like you are using the actual Haskell script that I included in this answer. I would not recommend using the bash script, I included only for historical reasons – lehins Apr 21 '20 at 02:18
  • Use these lines here instead. For Linus and Mac: https://github.com/lehins/massiv/blob/bad71fc1f38612710bfc1fde0ccdf26af90aa4f0/.azure/pipelines.yml#L16-L19 For Windows: https://github.com/lehins/massiv/blob/bad71fc1f38612710bfc1fde0ccdf26af90aa4f0/.azure/pipelines.yml#L48-L54 – lehins Apr 21 '20 at 02:20
1

I have provided a PR fix for this so modified time is no longer relied on!

Andres S
  • 471
  • 1
  • 5
  • 19
  • This is merged now in stack 2.5.1 - thank you @Andres S. Unfortunately even with stack 2.5.1 I continued to see the error `Trouble loading CompilerPaths cache` that brought me to this thread. For me it was the caching key, which was not correctly identified: `key: ${{ runner.os }}-${{ matrix.ghc }}` did not work, `key: ${{ runner.os }}-${{ matrix.ghc }}-stack` did. – nevrome Feb 16 '21 at 17:24
  • @nevrome with stack (and cabal) now caching by content correctly, unfortunately ghc itself is not. I have spent too much time inside of the ghc build code to realize this. If I ever get some extra time I'll see about writing a proposal/PR to fix this but it will be an undertaking. If you compile a simple codebase with ghc, change the modified time of a file, and recompile the project with ghc, you'll notice that the file is recompiled. – Andres S Feb 16 '21 at 21:12
  • I see - so maybe my issue about dependency caching is entirely unrelated to this thread after all. I'll leave the comment here anyway, because maybe somebody comes across it just like I did. Keep up the good work! – nevrome Feb 17 '21 at 10:27
  • 1
    Good news! A WIP PR was just opened against GHC https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5130 – Andres S Mar 01 '21 at 14:01