Split audio file into several files, each below a size threshold

Question

I have a FLAC file which I need to split into several distinct FLAC files, each of which must be below 100 MB in size. Are there any UNIX tools which can do this for me? Can I implement this logic myself?

Side-note: since FLAC is compressed, I figure that the easiest solution will require first converting the file to WAV.

score 8 · Accepted Answer · edited Jun 12 '20 at 06:58

There are two parts to your question.

Convert existing FLAC audio file to some other format like wav
Split converted wav file into chunk of specific size.

Obviously, there are more than one way to do this. However, pydub provides easier methods to accomplish above. details on pydub documentation can be found here.

1) Convert existing FLAC audio file to some other format like wav

Using pydub you can read FLAC audio format and then convert to wav as below

flac_audio = AudioSegment.from_file("sample.flac", "flac")
flac_audio.export("audio.wav", format="wav")

2) Split converted wav file into chunk of specific size.

Again, there are various ways to do this. The way I did this was to determine total length and size of the converted wavfile and then approximate that to desired chunk size.

The sample wav file used was of 101,612 KB size and about 589 sec or little over 9 minutes.

Wav File size by observation :

Stereo frame_rate 44.1KHz audio files are approximately 10 Mb per a minute. 48K would be a little larger.That means that the corresponding mono file would be 5 megs per minute

The approximation holds good for our sample file with about10 Mb per minute

Wav file size by math:

Co relation between wav file size and duration is given by

wav_file_size_in_bytes = (sample rate (44100) * bit rate (16-bit) * number of channels (2 for stereo) * number of seconds) / 8 (8 bits = 1 byte)

Source : http://manual.audacityteam.org/o/man/digital_audio.html

The formula I used to calculate chunks of audio file:

Get chunk size by following method

for duration_in_sec (X) we get wav_file_size (Y)
So whats duration in sec (K) given file size of 10Mb

This gives K = X * 10Mb / Y

pydub.utils has method make_chunks that can make chunks of specific duration (in milliseconds). We determine duration for desired size using above formula.

We use that to create chunks of 10Mb (or near 10Mb) and export each chunk separately. Last chunk may be smaller depending upon size.

Here is a working code.

from pydub import AudioSegment
#from pydub.utils import mediainfo
from pydub.utils import make_chunks
import math

flac_audio = AudioSegment.from_file("sample.flac", "flac")
flac_audio.export("audio.wav", format="wav")
myaudio = AudioSegment.from_file("audio.wav" , "wav")
channel_count = myaudio.channels    #Get channels
sample_width = myaudio.sample_width #Get sample width
duration_in_sec = len(myaudio) / 1000#Length of audio in sec
sample_rate = myaudio.frame_rate

print "sample_width=", sample_width 
print "channel_count=", channel_count
print "duration_in_sec=", duration_in_sec 
print "frame_rate=", sample_rate
bit_rate =16  #assumption , you can extract from mediainfo("test.wav") dynamically


wav_file_size = (sample_rate * bit_rate * channel_count * duration_in_sec) / 8
print "wav_file_size = ",wav_file_size


file_split_size = 10000000  # 10Mb OR 10, 000, 000 bytes
total_chunks =  wav_file_size // file_split_size

#Get chunk size by following method #There are more than one ofcourse
#for  duration_in_sec (X) -->  wav_file_size (Y)
#So   whats duration in sec  (K) --> for file size of 10Mb
#  K = X * 10Mb / Y

chunk_length_in_sec = math.ceil((duration_in_sec * 10000000 ) /wav_file_size)   #in sec
chunk_length_ms = chunk_length_in_sec * 1000
chunks = make_chunks(myaudio, chunk_length_ms)

#Export all of the individual chunks as wav files

for i, chunk in enumerate(chunks):
    chunk_name = "chunk{0}.wav".format(i)
    print "exporting", chunk_name
    chunk.export(chunk_name, format="wav")

Output:

Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
sample_width= 2
channel_count= 2
duration_in_sec= 589
frame_rate= 44100
wav_file_size =  103899600
exporting chunk0.wav
exporting chunk1.wav
exporting chunk2.wav
exporting chunk3.wav
exporting chunk4.wav
exporting chunk5.wav
exporting chunk6.wav
exporting chunk7.wav
exporting chunk8.wav
exporting chunk9.wav
exporting chunk10.wav
>>>

bonus points for producing flac files (i know its more complex because of the flac compression and variable bitrate) — milahu, Feb 28 '22 at 10:09

score 0 · Answer 2 · answered Nov 05 '22 at 05:09

I copied the code from here and created a function. Maybe it will help to someone!

from pydub import AudioSegment
import math
from hurry.filesize import size
from pydub.utils import which, make_chunks
AudioSegment.converter = which("ffmpeg")

def mp3_to_chunks(link:str, mb_split:int=49283072, i_format:str="mp4", o_format:str="wav", filename_to_save:str="chunk"):
    
    flac_audio = AudioSegment.from_file(link,  format=i_format)
    flac_audio.export("audio.wav", format="wav")
    myaudio = AudioSegment.from_file("audio.wav" , "wav")
    channel_count = myaudio.channels    
    sample_width = myaudio.sample_width 
    duration_in_sec = len(myaudio) / 1000
    sample_rate = myaudio.frame_rate
    bit_rate =16  
    wav_file_size = (sample_rate * bit_rate * channel_count * duration_in_sec) / 8
    file_split_size = mb_split  
    total_chunks =  wav_file_size // file_split_size
    chunk_length_in_sec = math.ceil((duration_in_sec * file_split_size ) /wav_file_size)   #in sec
    chunk_length_ms = chunk_length_in_sec * 1000
    chunks = make_chunks(myaudio, chunk_length_ms)

    list_chunks = []
    for i, chunk in enumerate(chunks):
        chunk_name = f"{filename_to_save}{i}.{o_format}"
        list_chunks.append(chunk_name)
        chunk.export(chunk_name, format=o_format)

    with open("audio.wav", 'rb') as file: 
        mp3 = file.read()

    print(f"Original file size: {size(sys.getsizeof(mp3))}")

    for i in list_chunks: 
        with open(i, 'rb') as file: 
            mp4 = file.read()

        print(f'Size for {i}: {size(sys.getsizeof(mp4))}')

    print("Check the content! File is saved ")


mp3_to_chunks('/content/Never Going Back Mashup  Best of 2021  Neha Kakkar Atif Aslam Jubin Nautiyal Emraan Hashmi.mp4')

Split audio file into several files, each below a size threshold

2 Answers2

Linked