1

I have tried with p12 keyfile, it is successfully working and I was able to fetch data from gcs bucket. But with json keyfile sparksession is not getting the json config values. Instead, It is going for default metadata. I am using maven and IntelliJ for development. Below is the code snippet

def main(args: Array[String]): Unit = {
System.out.println("hello gcp connect")
System.setProperty("hadoop.home.dir", "C:/hadoop/")
val sparkSession =
  SparkSession.builder()
    .appName("my first project")
    .master("local[*]")
    .config("spark.hadoop.fs.gs.project.id", "shaped-radius-297301")
    .config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
    .config("spark.hadoop.fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
    .config("spark.hadoop.google.cloud.project.id", "shaped-radius-297301")
    .config("spark.hadoop.google.cloud.auth.service.account.enable", "true")
    .config("spark.hadoop.google.cloud.auth.service.account.email", "service-account@shaped-radius-297301.iam.gserviceaccount.com")
    .config("spark.hadoop.google.cloud.service.account.json.keyfile", "C:/Users/shaped-radius-297301-5bf673d7f0d2.json")
    .getOrCreate()
    sparkSession.sparkContext.addFile("gs://test_bucket/sample1.csv")
    sparkSession.read.csv(SparkFiles.get("sample1.csv")).show()
  • 1
    Hi, please edit your question and put the text version of your code snippet instead of an image. It is best practice to post snippets as text, this way the community should be able to read your config easily. – Donnald Cucharo Dec 17 '20 at 05:35

2 Answers2

0

You need to work on your configurations. From the image you provided, it looks like your service account email and service account key are not correct. Please make sure that you are using a correct service account email with Cloud Storage Admin role on IAM for example:

serviceaccount@project-id.iam.gserviceaccount.com

And the path of your service account key should be a directory that can be seen by your config, the "path to json" should be a directory where your key is currently located.

Also, make sure that you are using a bucket that exists on your project or else you'll get errors like "bucket does not exist" or "access denied".

UPDATE

OP updated the question, refer to this link. It is possible that GOOGLE_APPLICATION_CREDENTIALS is pointing to the wrong location, or may not have right IAM permissions.

Donnald Cucharo
  • 3,866
  • 1
  • 10
  • 17
  • Thanks, Donald but I am running from my local windows machine with IntelliJ. I set up system property like this System.setProperty("GOOGLE_APPLICATION_CREDENTIALS","C:/Users/shaped-radius-297301-5bf673d7f0d2.json") but it seems not working. – Leeladhar ponnagani Dec 17 '20 at 18:11
  • Try [adding the environment variable on your IDE](https://www.jetbrains.com/help/objc/add-environment-variables-and-program-arguments.html#add-environment-variables) – Donnald Cucharo Dec 18 '20 at 02:59
  • I have tried to setup environment variables in different ways but it still the same. – Leeladhar ponnagani Dec 18 '20 at 15:57
  • I understand where you're coming from but if you still encounter errors with no actionable description and need troubleshooting assistance, I suggest contacting [GCP Tech support](https://cloud.google.com/support-hub) so we can properly take a look into your project. – Donnald Cucharo Dec 24 '20 at 01:47
0

There is problem while setting credential file, key file vin data bricks so i used

libraryDependencies += "com.github.samelamin" %% "spark-bigquery" % "0.2.6" to setup in one notebook in scala

import com.samelamin.spark.bigquery._

// Set up GCP credentials
sqlContext.setGcpJsonKeyFile("<JSON_KEY_FILE>")

// Set up BigQuery project and bucket
sqlContext.setBigQueryProjectId("<BILLING_PROJECT>")
sqlContext.setBigQueryGcsBucket("<GCS_BUCKET>")

and we are able to connect to google correctly with other notebook via python