3

I have to restructure a big project written in R, which is later consisting several packages as well as developers. Everything is set up on a git server.

The question is: How do I manage frequent changes inside packages without having to build them every time and developers updating them after they made a new pull? Is there any best practice or automation for that? I don't want source() with unbuilt packages and R.files but would like to stick with a package like structure as much as possible. We will work in a Windows environment.

Thanks.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Jax
  • 151
  • 1
  • 7
  • Does packrat fit your needs? – Bishops_Guest Apr 20 '17 at 15:33
  • Do you practise continuous integration? – Hugh Apr 20 '17 at 15:35
  • @Hugh Not yet, but I want to get them there. But I'm not that experienced in developing bigger R projects and doing continuous integration with it seems complicated. – Jax Apr 20 '17 at 15:38
  • Would something simple like `stopifnot(packageVersion("ggplot2") >= package_version("2.0.0"))` suffice? – Hugh Apr 20 '17 at 15:42
  • @Bishops_Guest packrat is also very interessting and I'm allready thinking about introducing it – Jax Apr 20 '17 at 15:51
  • @Hugh would work to prevent them from working with old packages but still does not make the process faster. I looked into the options I have for build servers in R but yeah...nothing nice. – Jax Apr 20 '17 at 15:53
  • @Bishops_Guest I have the same question and I also thought about packrat. But what I understand, packrat is for dependency management. But I am more interested in version control of the code. Do you have experience there? – Christoph May 01 '17 at 11:19
  • @Christoph maybe my answer posted could help you as well – Jax May 04 '17 at 08:54

1 Answers1

0

So I fiddled around a while, tried different setups and came up with an arrangement which fits my needs.

It basically consists two git repositories. The first on (let's call it base-repo) of them contains most scripts on which all later packages are based on. The second repo we will call the "package-repo". Most development work should be done on the base-repo. The base-repo is under CI control via a build server and unit tests.

The package-repo contains folders for each package we want to build and the base-repo as a git-submodule.

Each package can now be constructed via a very simple bash/shell script (“build script”):

  • check out a commit/tag of the submodule base-repo on which the stable package build should be based on
  • copy files which are necessary for the package into the specific package folder
  • checks and builds the package
  • script can also create a history file of package
  • script can either be invoked manually or by a build server

This approach can also be combined with packrat. Additional code which is very package specific can now be also added to the package-repo and is under version control while independed from the base-repo

The approach could be further extended to trigger the build of packages from the package-repo based on pushes to the base-repo. Packages with a build script pointing to master as a commit will always be up to date and if under control of a build server it will ensure that changes to the base-repo will not break the package. Also it is possible to create several packages containing the same scripts from base-repo.

See also: git: symlink/reference to a file in an external repository

Community
  • 1
  • 1
Jax
  • 151
  • 1
  • 7