0

I am writing a lambda function, I have to read a parquet file, for which I am using pyarrow package. It works fine in my local machine with below line of code.

pq_raw = pq.read_table(source='C:\\Users\\xxx\\Desktop\\testfolder\\yyyy.parquet')

Now I want to recreate the same functionality in lambda function with the file being in an S3 location. How this can be done?

Ludwig
  • 782
  • 1
  • 8
  • 24
  • 1
    check out this post: https://stackoverflow.com/questions/45043554/how-to-read-a-list-of-parquet-files-from-s3-as-a-pandas-dataframe-using-pyarrow The accepted answer seems to address your use case. – Captain Caveman Nov 03 '22 at 21:33
  • Please provide your lambda code showing what you have tried and explain why it does not work. Any errors? – Marcin Nov 04 '22 at 00:38

1 Answers1

1

I was able to read the file using the below method.

    obj = s3_client.get_object(Bucket=s3_bucket, Key=filekey)
    pq_raw = pq.read_table(source=BytesIO(obj['Body'].read()))
Ludwig
  • 782
  • 1
  • 8
  • 24