boto3 S3 Object Parsing

Question

I'm trying to write a Python script for processing audio data stored on S3.

I have an S3 object which I'm calling using

def grabAudio(filename, directory):

     obj = s3client.get_object(Bucket=bucketname, Key=directory+'/'+filename)

return obj['Body'].read()

Accessing the data using

print(obj['Body'].read())

yields the correct audio information. So its accessing the data from the bucket just fine.

When I try to then use this data in my audio processing library (pydub), it fails:

audio = AudioSegment.from_wav(grabAudio(filename, bucketname))

Traceback (most recent call last): File "split_audio.py", line 38, in <module> audio = AudioSegment.from_wav(grabAudio(filename, bucketname)) File "C:\Users\jmk_m\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydub\audio_segment.py", line 544, in from_wav return cls.from_file(file, 'wav', parameters) File "C:\Users\jmk_m\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydub\audio_segment.py", line 456, in from_file file.seek(0) AttributeError: 'bytes' object has no attribute 'seek'

What is the format of the object coming in from s3? Byte array I presume? If so, is there a way of parsing it into a .wav format without having to save to disk? I'm trying to refrain from saving to disk.

Also open to alternative audio processing libraries.

You can use `io.BytesIO` to create a file-like object from `bytes` and pass it to your library: https://stackoverflow.com/a/44437265/200603 — Linas Valiukas, Feb 26 '18 at 23:57

score 3 · Answer 1 · answered Feb 27 '18 at 16:33

Thanks to Linas for linking a similar issue, and Jiaaro for the answer.

 import io
    s = io.BytesIO(y['data'])
    AudioSegment.from_file(s).export(x, format='mp3')

Allows me to pull directly from the bucket into memory with

obj = s3client.get_object(Bucket=bucketname, Key=customername+'/'+filename)

data = io.BytesIO(obj['Body'].read())
audio = AudioSegment.from_file(data)

boto3 S3 Object Parsing

1 Answers1

Linked