19

I'm trying to run rhadoop on Cloudera's hadoop distro (I can't remember if its CDH3 or 4), and am running into an issue: Rstudio server doesn't seem to recognize my global variables.

In my /etc/profile.d/r.sh file, I have:

export HADOOP_HOME=/usr/lib/hadoop
export HADOOP_CONF=/usr/hadoop/conf
export HADOOP_CMD=/usr/bin/hadoop
export HADOOP_STREAMING=/usr/lib/hadoop-mapreduce/

When I run R from the terminal, I get:

> Sys.getenv("HADOOP_CMD")
[1] "usr/bin/hadoop"

But when I run Rstudio server:

> Sys.getenv("HADOOP_CMD")
[1] ""

And as a result, when I try to run rhdfs:

> library("rJava", lib.loc="/home/cloudera/R/x86_64-redhat-linux-gnu-library/2.15")
> library("rhdfs", lib.loc="/home/cloudera/R/x86_64-redhat-linux-gnu-library/2.15")
Error : .onLoad failed in loadNamespace() for 'rhdfs', details: 
    call: fun(libname, pkgname)
    error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Error: package/namespace load failed for 'rhdfs'

Does anyone know where I should be putting my enviornment variables if not in that specific r.sh file?

Thanks!

AI52487963
  • 1,253
  • 2
  • 17
  • 36

3 Answers3

14

You should set your environment variables in .Renviron or Renviron.site. I think these files are defined under R_HOME/etc/Renviron.site. You can get more information by typing:

> ?Startup

Someone had a similar issue here and this is what he did to solve it.

Charles Menguy
  • 40,830
  • 17
  • 95
  • 117
  • Hmm...I added HADOOP_CMD="/usr/bin/hadoop" to my ~/.Rprofile , but that didn't seem to do the trick. Rstudio server still gives a "" result for Sys.getenv("HADOOP_CMD"). – AI52487963 Jun 01 '13 at 01:33
  • 2
    Worked for me I added `SOMEVAR = /somepath` to `/usr/lib/R/etc/Renviron`. `R.home() = "/usr/lib/R"`. I restarted RStudio and `Sys.getenv('SOMEVAR')` correctly found the variable. – user1609452 Jun 01 '13 at 02:08
  • Sorry, I'm new to rstudio-server and can't seem to figure out where the Renviron profile would sit. In `/usr/lib/` I only have `rstudio-server` and subfolders `bin`, `extras`, `R`, `resources`, and `www`. Is there a recommended spot to start an Renviron file in? – AI52487963 Jun 01 '13 at 06:33
  • 2
    btw that 'here' link is currently outdated and I agree with @AI52487963 that the `?Startup` documentation is not very helpful for people new to rstudio-server – James Tobin Mar 13 '14 at 19:16
  • 2
    In case it helps any one else, when you add your new environmental variables to Renviron, and you restart rstudio server, you may not initially see the extra environmental variables when you use `system("env")`. However, when you SWITCH projects at least once (after restarting rstudio server), you should see the new environmental variables you added when you call `system("env")` – FXQuantTrader Nov 26 '15 at 02:14
  • This solution works for me using RStudio Server . Thanks ! – akunyer Jul 14 '16 at 11:43
  • Unfortunately this sets the environment *after* R was started, not before, which means that environment variables that control application launch are disregarded. It appears that there is no workaround for the free version of RStudio Server, only for the Pro version. – Konrad Rudolph Nov 07 '19 at 11:07
3

Note that on Windows, R looks for the .Renviron file in /Users/<name>/Documents, while RStudio appears to expect the .Renviron file to be in /Users/<name>/.

-1

You should set your environment variables in Rstudio like

Sys.setenv("/path to hadoop")

and then you try this

  • 1
    This is an effective way to solve path issues, but please note that the correct syntax would be `Sys.setenv(HADOOP_CMD="/path/to/hadoop")`. – FvD Aug 18 '14 at 15:41