Your accepted answer is great as long as whoever is using the PowerBI query has the AWS CLI on their machine. The use case that I have is that the individual that is going to be using this PowerBI template wants to log in and be done with authentication from that point.
Another option for accessing an S3 bucket through PowerBI is to use AWS API Gateway and Lambda functions to create a pre-signed URL, as per this question on the Power BI forums. Here is the relevant answer from the user spinwards:
- Create a new Lambda Function ... it can be empty for now
- Set up a new API in API Gateway
- Create a new GET method
- 3.1 Select Lambda Function for the integration type
- 3.2 Select the Use Lambda Proxy integration option
- 3.3 Select the region and type in the name of the lambda function you created in step 1
- Edit your lambda function.
- 4.1 Using the AWS SDK, generate a url w/ pre-signed key for your file
- 4.2 Return a 303 redirect to the url from step 4.1
Here is a sample of a lambda function in python 2.7:
bucket = 'bucket-name'
key = 'path-to-file'
client = boto3.client('s3')
link = client.generate_presigned_url(
'get_object',
{'Bucket': bucket, 'Key': key},
7200, 'GET')
return {
"statusCode": 303,
"headers": {'Location': link}
}
You can use this PowerBI query as a starting point:
let
// Replace the URI with the "invoke url" of your API Gateway endpoint
// See: https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-call-api.html#how-to-call-api-console
uri = "https://your-web-api.execute-api.us-east-1.amazonaws.com/stage/endpoint",
// PowerBI should understand the 303 redirect and follow it to your presigned s3 url
// Make sure to set IsRetry to true so that PowerBI gets a fresh url with a
// fresh presigned key every time
raw = Web.Contents(uri, [IsRetry=true]),
// My file is a gzipped utf-8 tsv, so I need to decompress it
// and tell the csv parser the delimiter and encoding
binary = Binary.Decompress(raw, Compression.GZip),
csv = Csv.Document(binary, [
Delimiter="#(tab)",
extraValues=ExtraValues.Ignore,
Encoding=65001]) // 65001 is utf8
in
csv
Once you get everything working with a single file, you can parameterize your API and Lambda function to accept an s3 path so that you can pull in anything from your S3 account.
Spinwards also offers the following warning about authentication:
Before you roll this out to production you should think about some kind of authentication. The normal AWS IAM authentication tokens don't play nicely with PowerBI, so you will need to use a custom authentication method for your API Gateway: https://docs.aws.amazon.com/apigateway/latest/developerguide/use-custom-authorizer.html. I ended up using basic auth which is not the strongest thing in the world, but API Gateway uses https, so it should be secure enough.
Because v4 signature auth is not automatically supported in PowerBI and my users may not have the CLI on their machines, another option that I have been experimenting is including and refactoring the python v4 signature examples from AWS documentation pages in my python query. This would allow the call to be made without having the AWS CLI on the machine, but requires the user to input their Access Key and Secret Access Key. As far as I can tell, parameters in PowerBI are not exposed to python queries automatically, so I am looking into exposing the parameters by creating a dataset containing them, as described here: Does Power BI parameters are exposed to Python globals