Reading parquet file using pyarrow in lambda

Question

I am writing a lambda function, I have to read a parquet file, for which I am using pyarrow package. It works fine in my local machine with below line of code.

pq_raw = pq.read_table(source='C:\\Users\\xxx\\Desktop\\testfolder\\yyyy.parquet')

Now I want to recreate the same functionality in lambda function with the file being in an S3 location. How this can be done?

check out this post: https://stackoverflow.com/questions/45043554/how-to-read-a-list-of-parquet-files-from-s3-as-a-pandas-dataframe-using-pyarrow The accepted answer seems to address your use case. — Captain Caveman, Nov 03 '22 at 21:33
Please provide your lambda code showing what you have tried and explain why it does not work. Any errors? — Marcin, Nov 04 '22 at 00:38

score 1 · Answer 1 · answered Nov 08 '22 at 20:11

1

I was able to read the file using the below method.

    obj = s3_client.get_object(Bucket=s3_bucket, Key=filekey)
    pq_raw = pq.read_table(source=BytesIO(obj['Body'].read()))

answered Nov 08 '22 at 20:11

Ludwig

782
1
8
24

Reading parquet file using pyarrow in lambda

1 Answers1