0

I am using https://github.com/potix2/spark-google-spreadsheets library for reading the spread sheet file in spark. It is working perfectly in my local.

val df = sqlContext.read.
    format("com.github.potix2.spark.google.spreadsheets").
    option("serviceAccountId", "xxxxxx@developer.gserviceaccount.com").
    option("credentialPath", "/path/to/credentail.p12").
    load("<spreadsheetId>/worksheet1")

I created a new assembly jar with included all the credentials and use that jar for reading the file. But I am facing issue with reading the credentialPath file. I tried using

getClass.getResourceAsStream("/resources/Aircraft/allAircraft.txt")

But library only supports absolute path. Please help me to resolve this issue.

John
  • 1,531
  • 6
  • 18
  • 30
  • It might be because it's a rather bad idea to put credentials into a jar. Pass it along via ENV or deploy it separately. – Reactormonk Jan 28 '17 at 09:22
  • @Reactormonk, Can you provide me some suggestions//link how to use with ENV. thakns – John Jan 28 '17 at 09:27
  • Possibly related: https://softwareengineering.stackexchange.com/questions/205606/strategy-for-keeping-secret-info-such-as-api-keys-out-of-source-control – Reactormonk Jan 28 '17 at 10:39
  • @Reactormonk I will check and let you know. Thanks for the share – John Jan 29 '17 at 04:19

3 Answers3

1

You can use --files argument of spark-submit or SparkContext.addFile() to distribute a credential file. If you want to get a local path of the credential file in worker node, you should call SparkFiles.get("credential filename").

import org.apache.spark.SparkFiles

// you can also use `spark-submit --files=credential.p12`
sqlContext.sparkContext.addFile("credential.p12")
val credentialPath = SparkFiles.get("credential.p12")

val df = sqlContext.read.
    format("com.github.potix2.spark.google.spreadsheets").
    option("serviceAccountId", "xxxxxx@developer.gserviceaccount.com").
    option("credentialPath", credentialPath).
    load("<spreadsheetId>/worksheet1")
potix2
  • 11
  • 3
  • Thank you so much .I will try and let you know . Btw is it possible to update sheet using this library. – John Jan 30 '17 at 11:37
0

Use SBT and try typesafe config library.

Here is a simple but complete sample which reads some information from the config file placed in resources folder.

Then you can assemble a jar file using sbt-assembly plugin.

Amir Karimi
  • 5,401
  • 4
  • 32
  • 51
  • It really easy to manage confiuraton using typesafe config library. Thanks for that. But I really want to get the file location of the file from jar. http://stackoverflow.com/questions/941754/how-to-get-a-path-to-a-resource-in-a-java-jar-file. Is it possible using typesafe. I could not find a way to do that. @amirkarimi – John Jan 28 '17 at 10:46
  • As you said the library just supports absolute path. What about getting the resource as stream and then write it to a physical file and give the file path to the library? It might have some serious security issues, though. – Amir Karimi Jan 28 '17 at 11:36
  • BTW, try this: Give this as credentialPath: `ClassLoader.getSystemResource("/resources/...").toURI()`. – Amir Karimi Jan 28 '17 at 11:42
  • Writing in to the temporary file is working.Is there any better solution apart from this. I tried ClasssLoader approach too.. but no luck . thanks – John Jan 29 '17 at 04:20
  • I checked out the source code of the library and I think this is the last solution. The library can be patched to allow creating `File` instance from other sources. [Here](https://github.com/potix2/spark-google-spreadsheets/blob/master/src/main/scala/com/github/potix2/spark/google/spreadsheets/DefaultSource.scala#L59) is the code which converts the `credentialPath` to a `File` instance. It would be a great opportunity to contribute to a Scala library ;) – Amir Karimi Jan 29 '17 at 09:41
  • Thanks for the valuable suggestions.I am bit new to scala. But I will take a look. – John Jan 29 '17 at 11:12
0

If you're working in the Databricks environment, you can upload the credentials file.

Setting the GOOGLE_APPLICATION_CREDENTIALS environment variable, as described here, does not get you around this requirement because it's a link to the file path, not the actual credentials. See here for more details about getting the right credentials and using the library.

Powers
  • 18,150
  • 10
  • 103
  • 108