2

I am writing a google big query connector for spark and underneath it uses the google hadoop connector.

Currently the google hadoop connector requires a Google env variable pointing to the creds json file.

This can be annoying to set up when your launching clusters outside the dataproc world

Is it bad practice to set it in real time in the code? or is there a workaround to tell the hadoop connector to ignore the env variable since its been set in the "fs.gs.auth.service.account.json.keyfile" hadoop configuration?

Dennis since your a contributor on the project, perhaps you can help this time too?

Community
  • 1
  • 1
Sam Elamin
  • 245
  • 1
  • 8
  • @dennis-huo you were mentioned – Pentium10 Feb 20 '17 at 13:18
  • Hmm, I don't recall where environment variables are used; it should only be using Hadoop configuration keys. Do you have a pointer to where in the code environment variables are being used? – Dennis Huo Feb 20 '17 at 22:33
  • @DennisHuo its used when creating the client val bigquery = { val credential = GoogleCredential.getApplicationDefault.createScoped(SCOPES) new Bigquery.Builder(new NetHttpTransport, new JacksonFactory, credential) .setApplicationName("spark-bigquery") .build() – Sam Elamin Feb 22 '17 at 13:20

1 Answers1

2

For those interested I just set them in run time using the below gist in scala

https://gist.github.com/jaytaylor/770bc416f0dd5954cf0f

But here is the code in case the gist ever goes offline

trait EnvHacker {
/**
 * Portable method for setting env vars on both *nix and Windows.
 * @see http://stackoverflow.com/a/7201825/293064
 */
def setEnv(newEnv: Map[String, String]): Unit = {
    try {
        val processEnvironmentClass = Class.forName("java.lang.ProcessEnvironment")
        val theEnvironmentField = processEnvironmentClass.getDeclaredField("theEnvironment")
        theEnvironmentField.setAccessible(true)
        val env = theEnvironmentField.get(null).asInstanceOf[JavaMap[String, String]]
        env.putAll(newEnv)
        val theCaseInsensitiveEnvironmentField = processEnvironmentClass.getDeclaredField("theCaseInsensitiveEnvironment")
        theCaseInsensitiveEnvironmentField.setAccessible(true)
        val cienv = theCaseInsensitiveEnvironmentField.get(null).asInstanceOf[JavaMap[String, String]]
        cienv.putAll(newEnv)
    } catch {
        case e: NoSuchFieldException =>
            try {
                val classes = classOf[Collections].getDeclaredClasses()
                val env = System.getenv()
                for (cl <- classes) {
                    if (cl.getName() == "java.util.Collections$UnmodifiableMap") {
                        val field = cl.getDeclaredField("m")
                        field.setAccessible(true)
                        val obj = field.get(env)
                        val map = obj.asInstanceOf[JavaMap[String, String]]
                        map.clear()
                        map.putAll(newEnv)
                    }
                }
            } catch {
                case e2: Exception => e2.printStackTrace()
            }

        case e1: Exception => e1.printStackTrace()
    }
}

}

Sam Elamin
  • 245
  • 1
  • 8