1

How do I pass bigquery credentials (access key) while accessing data using pyspark on local machine (mac)?

VP10
  • 127
  • 1
  • 2
  • 13

1 Answers1

1

You can't access BigQuery with access key. You will need a server-to-server service account [1] to access BigQuery from anywhere, Spark included.

[1] https://developers.google.com/identity/protocols/OAuth2ServiceAccount

DoiT International
  • 2,405
  • 1
  • 20
  • 24
  • Yes i have a service account and pem key file but my spark is still local not on a server; would it work? If so how do I pass the parameters. Thank you! – VP10 Feb 14 '16 at 22:25
  • @VP10 - what do you mean by mentioning "not local on a server"? – DoiT International Feb 15 '16 at 06:21
  • i have spark installed on my mac locally its not on any server. And i want to get data from bigquery and m seeing some credentials related error. I want to know how should i pass account credentials using pyspark. Thanks in advance! – VP10 Feb 15 '16 at 06:27
  • @VP10 could you please post the errors you're seeing? – DoiT International Feb 15 '16 at 06:28
  • 1
    Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.: java.io.IOException: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token – VP10 Feb 15 '16 at 06:29
  • @VP10 - you're seeing this error because the integration was designed to work from Google Compute Engine instances only. It tries to reach the Google's metadata server but it cannot because the only way to access it is from within Google's GCE network. – DoiT International Feb 15 '16 at 06:31
  • I see! Thanks @VadimSolovey! – VP10 Feb 15 '16 at 06:32
  • If I understand correctly, you're trying to use the BigQuery connector from a local cluster. You should be able to do this, but (unlike with a GCE deployment) you'll need to use service-account "keyfile" authentication. This SO answer (http://stackoverflow.com/questions/25291397/migrating-50tb-data-from-local-hadoop-cluster-to-google-cloud-storage/25342520#25342520) explains how to do it for the GCS connector, but the BigQuery connector authentication should work the same. – William Vambenepe Feb 18 '16 at 07:39