0

I want to read parquet files from an S3 bucket.

Here's my code:

for obj in bucket.objects.filter(Prefix=f'some_prefix/'):

    response = obj.get()

    df = pd.read_parquet(response['Body'],columns=relevant_columns)

    #some data processing

    df.to_csv('some_path',
                storage_options = {'key': key, 'secret': secret},index=False)

I get this error:

ArrowInvalid: Called Open() on an uninitialized FileSource
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
qwerty
  • 889
  • 6
  • 16
  • 1
    Which line is generating the error? If you look at `response['Body']`, does it contain the information you would expect? The `read_parquet()` documentation says that the first parameter (`path`) should contain _"str, path object or file-like object"_ -- however, I suspect that your code is returning a `StreamingBody()` rather than a string. You'll need to convert it. See: [Open S3 object as a string with Boto3](https://stackoverflow.com/a/35376156/174777) – John Rotenstein Oct 03 '22 at 05:32

0 Answers0