0

This question is related to this: when you are writing a package, how to specify a dependency (either in Imports or Depends) on an existing R package which is not on CRAN.

I am writing an R package that imports SparkR, which is not in CRAN anymore (it is delivered with Spark in the R folder). I have tried adding the GitHub link to http://github.com/apache/spark/tree/master/R/pkg in the Additional_repositories field of my DESCRIPTION file, with no luck since the R CMD commands (install, check, etc) keep complaining that SparkR could not be found. The same problem has been discussed in this post. In my case, my package is also heavily dependent on SparkR to move it to Suggests.

Could you please suggest an appropriate way to fix this, instead of just assume the SparkR folder already exists in the user's R library folder.

Thanks

Community
  • 1
  • 1
Pablo
  • 328
  • 2
  • 16

2 Answers2

1

What's wrong with assuming your user has SparkR already installed? If they're using Spark, then they already have it (since you said it comes with Spark). If they're not using Spark, then they don't need it (and presumably they don't need your package either). Put a message in your documentation somewhere about installing SparkR, if it bugs you.

If you just want one function from SparkR that is useful outside Spark, then just copy it into your own code (and acknowledge the source). SparkR is Apache licensed, so you're allowed to do this. Or if you don't want to copy, then write your own.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • You're right, the thing is that it is not enough to have SparkR installed in the Spark home directory, but the SparkR folder needs to be copied to one of the locations where R packages are (either in the personal library folder, /R, or in /usr/lib/R/library or /usr/lib64/R/library). Otherwise, the `R CMD` commands (and almost any check) complaint because they won't find the SparkR dependence stated in the `Imports` field. I assumed there would be some elegant way to specify the dependency using only the `DESCRIPTION` file. – Pablo Mar 14 '16 at 12:02
  • I think now I understand the second option: creating a CRAN-like repository and using it to specify the dependency. I am not sure if this may work with the `DEPENDENCIES` field since it does not with GitHub, but I will give it a try. Thanks you guys! – Pablo Mar 14 '16 at 12:03
  • @Pablo if someone wants to use SparkR, they have to copy the folder anyway. – Hong Ooi Mar 14 '16 at 12:08
  • Documenting behavior is the way to go (+1). Copying SparkR code doesn't make sense. Not only SparkR is not a standalone package but it is tightly bound to Spark binaries. You cannot simply assume that __internal__ API is compatible between versions and requiring specific version of Spark on the cluster is simply unrealistic. Not to mention that Spark has to be built with SparkR support and each machine in the cluster requires R interpreter. – zero323 Mar 14 '16 at 20:28
0

Two options: 1. Give users instructions as to how install this specific package( see help(install.packages) -- it's one line once you know the path) 2. Run your own repo. Package drat helps you run your own repo for instance on github.

piccolbo
  • 1,305
  • 7
  • 17
  • Thanks for your answer. W.r.t the second option, I think I do not understand why this would solve the problem... I would still have to specify something -either the spark repo or mine- in the `DESCRIPTION` file, right? Then I would run into the same problem. I was thinking about an error in the URL or something -maybe because there is no "SparkR" folder in the Spark repo (the R package structure itself starts in the "pkg" folder)? – Pablo Mar 14 '16 at 08:52
  • You just need to tell users to add your repo to the "repos" option on directly as in the repos argument to `install.packages`. I think you should stop thinking about changing the DESCRIPTION file, because there is nothing that can help with this problem. There is no way I know of to make "install.packages" work at defaults in this situation. – piccolbo Mar 14 '16 at 15:37