I'm wondering if PySpark supports S3 access using IAM roles. Specifically, I have a business constraint where I have to assume an AWS role in order to access a given bucket. This is fine when using boto (as it's part of the API), but I can't find a definitive answer as to if PySpark supports this out of the box.
Ideally, I'd like to be able to assume a role when running in standalone mode locally and point my SparkContext to that s3 path. I've seen that non-IAM calls usually follow :
spark_conf = SparkConf().setMaster('local[*]').setAppName('MyApp')
sc = SparkContext(conf=spark_conf)
rdd = sc.textFile('s3://<MY-ID>:<MY-KEY>@some-bucket/some-key')
Does something like this exist for providing IAM info? :
rdd = sc.textFile('s3://<MY-ID>:<MY-KEY>:<MY-SESSION>@some-bucket/some-key')
or
rdd = sc.textFile('s3://<ROLE-ARN>:<ROLE-SESSION-NAME>@some-bucket/some-key')
If not, what are the best practices for working with IAM creds? Is it even possible?
I'm using Python 1.7 and PySpark 1.6.0
Thanks!