8

I want to get some csv files from a bucket in s3. As there is no connector, i need to go through a query to reach s3. How do i set my query ?

Tried to use postman and copy the info from the preview in postman, but to no avail.

Postman Headers and url

Now i need to get those headers into powerBI logically, but, powerBi can't do the conversions like postman do. Is there a solution ?

powerBi query editor

Joe le clodo
  • 97
  • 2
  • 5

2 Answers2

4

Your accepted answer is great as long as whoever is using the PowerBI query has the AWS CLI on their machine. The use case that I have is that the individual that is going to be using this PowerBI template wants to log in and be done with authentication from that point.

Another option for accessing an S3 bucket through PowerBI is to use AWS API Gateway and Lambda functions to create a pre-signed URL, as per this question on the Power BI forums. Here is the relevant answer from the user spinwards:

  1. Create a new Lambda Function ... it can be empty for now
  2. Set up a new API in API Gateway
  3. Create a new GET method
    • 3.1 Select Lambda Function for the integration type
    • 3.2 Select the Use Lambda Proxy integration option
    • 3.3 Select the region and type in the name of the lambda function you created in step 1
  4. Edit your lambda function.
    • 4.1 Using the AWS SDK, generate a url w/ pre-signed key for your file
    • 4.2 Return a 303 redirect to the url from step 4.1

Here is a sample of a lambda function in python 2.7:

    bucket = 'bucket-name'
    key = 'path-to-file'

    client = boto3.client('s3')
    link = client.generate_presigned_url(
            'get_object',
            {'Bucket': bucket, 'Key': key},
            7200, 'GET')

    return {
        "statusCode": 303,
        "headers": {'Location': link}
    }

You can use this PowerBI query as a starting point:

    let
        // Replace the URI with the "invoke url" of your API Gateway endpoint
        // See: https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-call-api.html#how-to-call-api-console
        uri = "https://your-web-api.execute-api.us-east-1.amazonaws.com/stage/endpoint",

        // PowerBI should understand the 303 redirect and follow it to your presigned s3 url
        // Make sure to set IsRetry to true so that PowerBI gets a fresh url with a 
        //   fresh presigned key every time
        raw = Web.Contents(uri, [IsRetry=true]), 

        // My file is a gzipped utf-8 tsv, so I need to decompress it
        // and tell the csv parser the delimiter and encoding
        binary = Binary.Decompress(raw, Compression.GZip),
        csv = Csv.Document(binary, [
            Delimiter="#(tab)",
            extraValues=ExtraValues.Ignore,
            Encoding=65001]) // 65001 is utf8
    in
        csv

Once you get everything working with a single file, you can parameterize your API and Lambda function to accept an s3 path so that you can pull in anything from your S3 account.

Spinwards also offers the following warning about authentication:

Before you roll this out to production you should think about some kind of authentication. The normal AWS IAM authentication tokens don't play nicely with PowerBI, so you will need to use a custom authentication method for your API Gateway: https://docs.aws.amazon.com/apigateway/latest/developerguide/use-custom-authorizer.html. I ended up using basic auth which is not the strongest thing in the world, but API Gateway uses https, so it should be secure enough.

Because v4 signature auth is not automatically supported in PowerBI and my users may not have the CLI on their machines, another option that I have been experimenting is including and refactoring the python v4 signature examples from AWS documentation pages in my python query. This would allow the call to be made without having the AWS CLI on the machine, but requires the user to input their Access Key and Secret Access Key. As far as I can tell, parameters in PowerBI are not exposed to python queries automatically, so I am looking into exposing the parameters by creating a dataset containing them, as described here: Does Power BI parameters are exposed to Python globals

Kashyap
  • 15,354
  • 13
  • 64
  • 103
tatertot
  • 123
  • 2
  • 11
0

Ok, so i finally succeeded by using a python script inside powerBI.

Because powerBI can use pyhton, you can actually import your files or object directly from s3. I needed one csv file and i used this code to do it:

import boto3
import pandas as pd
import io

bucket = 'name of your bucket'
key = 'name of your file'

s3 = boto3.client('s3')
f = s3.get_object(Bucket=bucket, Key=key)
shape = pd.read_csv(io.BytesIO(f['Body'].read()), header=0, index_col=0)
shape = shape.apply(lambda x: x.fillna(0))
print(shape)

This allowed me to transform my object into a panda dataframe, that powerBI is able to read. Make sure you check if powerBI use the correct python environment with pnada and io installed and that you have boto3 with you credentials registered in aws via awscli or in the file credentials (the same apply with the region).

Joe le clodo
  • 97
  • 2
  • 5
  • hey Joe! This works fine for me on PowerBI desktop, but since boto3 is not in the supported python packages online see: https://learn.microsoft.com/en-us/power-bi/connect-data/service-python-packages-support , I can't seem to get this working. Did you resolve this? – amro_ghoneim Feb 28 '23 at 20:38