5

I am interested in understanding whether bazel can handle "two stage builds", where dependencies are discovered based on the file contents and dependencies must be compiled before the code that depends on them (unlike C/C++ where dependencies are mostly header files that are not separately compiled). Concretely, I am building the Coq language which is like Ocaml.

My intuition for creating a build plan would use an (existing) tool (called coqdep) that reads a .v file and returns a list of all of its direct dependencies. Here's the algorithm that I have in mind:

  1. invoke coqdep on the target file and (transitively) on each of its dependent files,
  2. once transitive dependencies for a target are computed, add a rule to build the .vo from the .v that includes transitive dependencies.

Ideally, the calls to coqdep (in step 1) would be cached between builds and so only need to be re-computed when the file changes. And the transitive closure of the dependency information would also be cached.

Is it possible to implement this in bazel? Are there any pointers to setting up builds for languages like this? Naively, it seems to be a two-stage build and I'm not sure how this fits into bazel's compilation model. When I looked at the rules for Ocaml, it seemed like it was relying on ocamlbuild to satisfy the build order and dependency requirements rather than doing it "natively" in bazel.

Thanks for any pointers or insights.

Gregory
  • 1,205
  • 10
  • 17
  • First stage of `dune` support for Coq did exactly that, since then we have moved so actually dune itself can understand coqdep output. So if you are interested ping me in Zulip and I could point you to a version of the Coq tree which the proper script that would be easily adapted to generate bazel rules. Note that Dune does indeed already support some of the features you are interesting in, and a few more goodies that turn out very convenient. You may have some pain with Bazel due to the coupling of .v files with plugins. – ejgallego Dec 19 '20 at 14:43
  • 1
    Thanks. `dune` is indeed an option as well. I also need support for building coq from. C++ (tracking #include dependencies), that seems like it would require patches to dune (unless there is some sort of plugin support), or possibly another script to generate a bunch of dune rules. I'm also very interested in the remote caching, which could help to streamline our CI. I'm not certain how well it will work with Coq, but it is intriguing. – Gregory Dec 19 '20 at 15:50
  • dune already support c++ code but it is not very complete. But what I mean is that you could use our infrastructure to generate `dune` rule to generate `bazel` rules, we did this kind of staging already and it worked pretty well. – ejgallego Dec 19 '20 at 19:32
  • Your main problem will be to teach bazel to build OCaml files, this happens when your .v files depend on a Coq plugin, otherwise you need another staging phase. Unfortunately building OCaml libraries is not easy. – ejgallego Dec 19 '20 at 19:33
  • 1
    Thanks for the insights. I saw that there was ocaml support for bazel, but I didn't look into it enough to be sure that it would support the coq use case. Given that the build times for ML are negligible compared to Coq, if we didn't get much caching in that it wouldn't be too bad. I will follow up regarding dune. – Gregory Dec 20 '20 at 00:31
  • Dune caching is excellent and it is getting much better, so if that's your main concern it may indeed worth it. – ejgallego Dec 20 '20 at 02:57

2 Answers2

3

(don't have enough rep to comment yet, so this is an answer)

#2 of Toraxis' answer is probably the most canonical.

gazelle is an example of this for Golang, which is in the same boat: dependencies for Golang files are determined outside a Bazel context by reading the import statements of source files. gazelle is a tool that writes/rewrites Golang rules in BUILD files according to the imports in source files of the Bazel workspace. Similar tools could be created for other languages that follow this pattern.

but the generated BUILD file will be in the output folder, not in the source folder. So you also have to provide an executable that copies the files back into the source folder.

Note that binaries run via bazel run have the environment variable BUILD_WORKSPACE_DIRECTORY set to the root of the Bazel workspace (see the docs) so if your tool uses this environment variable, it could edit the BUILD files in-place rather than generating and copying back.

(In fact, the generating-and-copying-back strategy would likely not be feasible, because purely-generated files would contain only Coq rules, and not any other types of rules. To generate a BUILD file with Coq rules from one with other types of rules, one would have to add the BUILD files themselves as dependencies - which would create quite the mess!)

Scott Minor
  • 137
  • 1
  • 7
  • hmm but Gregory wants to have an *incremental* build of the `BUILD` file (to avoid calling `coqdep` on all the files all the time). I guess one could have build rules that call `coqdep` and but the results in files, then have these files as input to the executable that in-place edits the `BUILD` files. Then `bazel run //tools/update-coq-build-files` would benefit from caching the results of `coqdep`. -- I tried to add this idea to my answer :) – Toxaris Dec 22 '20 at 16:26
2

I'm looking into similar questions because I want to build ReasonML with Bazel.

Bazel computes the dependencies between Bazel targets based on the BUILD files in your repository without accessing your source files. The only interaction you can do with the file system during this analysis phase is to list directory contents by using glob in your rule invocations.

Currently, I see four options for getting fine-grained incremental builds with Bazel:

  1. Spell out the fine-grained dependencies in hand-written BUILD files.
  2. Use a tool for generating the BUILD files. You cannot directly wrap that tool in a Bazel rule to have it run during bazel build because the generated BUILD file would be in the output folder, not in the source folder. But you can run rules that call coqdep during the build, and provide an executable that edits the BUILD file in the source folder based on the (cacheable) result of the coqdep calls. Since you can read both the source and the output folder during the build, you could even print a message to the user if they have to run the executable again. Anyway, the full build process would be bazel run //tools/update-coq-build-files && bazel build to reach a fixed point.
  3. Have coarse-grained dependencies in the BUILD files but persistent workers to incrementally rebuild individual targets.
  4. Have coare-grained dependencies in the BUILD files but generate a separate action for each target file and use the unused_inputs_list argument of ctx.actions.run to communicate to Bazel which dependencies where actually unused.

I'm not really sure whether 3 and 4 would actually work or how much effort would be involved, though.

Toxaris
  • 7,156
  • 1
  • 21
  • 37
  • Generally, it sounds like bazel is not built to do this sort of thing, which is quite unfortunate. The exchange above about dune suggests that this is possible in that system, but dune itself is not very extensible without editing the source code of the tool itself and rebuilding. In my exploration for an answer,, I ran into the pluto build system https://pluto-build.github.io/ which does support this, but is very new and doesn't have the other nice features of bazel. – Gregory Jan 03 '21 at 18:34
  • Yes I think pluto is designed to support this situation. It was developed in the context of language extensibility, where you discover the activation of a language extension in the middle of processing a source file, then have to compile that language extension, then continue compiling your original source file. So pluto builders can declare additional dependencies at build time. I'm not aware that development of pluto substantially continued after the OOPSLA paper (Erdweg et al. 2015), but maybe I'm wrong. – Toxaris Jan 07 '21 at 00:58