Spark Delta with AWS SSO

Question

What I'm trying to do:

read from and write to S3 buckets across multiple AWS_PROFILE's

resources:

https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Configuring_different_S3_buckets_with_Per-Bucket_Configuration
- does show how to use different cred on per-bucket
- does show how to use different credential providers
- doesn't show how to use more than one AWS_PROFILE
https://spark.apache.org/docs/latest/cloud-integration.html#authenticating
https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html
No FileSystem for scheme: s3 with pyspark

What I have working so far:

AWS SSO works and i can access different resources in python via boto3 by changing environment variable AWS_PROFILE

delta spark can read and write to S3 using hadoop configurations

enable delta tables for pyspark

builder.config("spark.sql.extensions",           
    "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog",
    "org.apache.spark.sql.delta.catalog.DeltaCatalog"))

allow s3 schema for read/write

  "spark.hadoop.fs.s3.impl",
  "org.apache.hadoop.fs.s3a.S3AFileSystem"

use instance profile AWS_PROFILE for one or more buckets

"fs.s3a.bucket.{prod_bucket}.aws.credentials.provider",
"com.amazonaws.auth.InstanceProfileCredentialsProvider"


any help, suggestions, comments appreciated. thanks!

i think i can use IAM roles to span multiple profiles? i'll try it https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/assumed_roles.html — 123, Oct 13 '22 at 20:12

score 3 · Answer 1 · answered Oct 14 '22 at 10:34

3

As of October 2022, the s3a connector doesn't support AWS SSO/identity server. Moving to the AWS SDK v2 is a prerequisite, which is a WiP.

See HADOOP-18352

answered Oct 14 '22 at 10:34

stevel

12,567
1
39
50

Thank you, Steve! I should've checked more thoroughly. Appreciate all your contributions to Hadoop! – 123 Oct 14 '22 at 16:33
1

no worries. maybe if you use the sso tools on the CLI and get the credentials exported locally, you could then get the current session creds into the spark conf... – stevel Oct 15 '22 at 13:57

Spark Delta with AWS SSO

1 Answers1

Linked