1

I am using wave files for making deep learning model they are in different length , so i want to pad all of them to 16 sec length using python

Amin
  • 31
  • 2
  • 6
  • Might check out https://stackoverflow.com/questions/46757852/adding-silent-frame-to-wav-file-using-python – tgikal Oct 16 '18 at 18:05
  • This could be done with `scipy.io.wavfile`, pydub, or pure Python (using the wav module -- though lower-level and a bit more tedious). Do you care which tool is used? – tom10 Oct 16 '18 at 22:22

3 Answers3

3

If I understood correctly, the question wants to fix all lengths to a given length. Therefore, the solution will be slightly different:

from pydub import AudioSegment

pad_ms = 1000  # Add here the fix length you want (in milliseconds)
audio = AudioSegment.from_wav('you-wav-file.wav')
assert pad_ms > len(audio), "Audio was longer that 1 second. Path: " + str(full_path)
silence = AudioSegment.silent(duration=pad_ms-len(audio)+1)


padded = audio + silence  # Adding silence after the audio
padded.export('padded-file.wav', format='wav')

This answer differs from this one in the sense that this one creates all audios from the same length where the other adds the same size of silence at the end.

J Agustin Barrachina
  • 3,501
  • 1
  • 32
  • 52
1

Using pydub:

from pydub import AudioSegment

pad_ms = 1000  # milliseconds of silence needed
silence = AudioSegment.silent(duration=pad_ms)
audio = AudioSegment.from_wav('you-wav-file.wav')

padded = audio + silence  # Adding silence after the audio
padded.export('padded-file.wav', format='wav')

AudioSegment objects are immutable

Vikrant Sharma
  • 419
  • 3
  • 6
0

You can use Librosa. The Librosa.util.fix_length function adds silent patch to audio file by appending zeros to the end the numpy array containing the audio data:

from librosa import load
from librosa.util import fix_length


file_path = 'dir/audio.wav'

sf = 44100 # sampling frequency of wav file
required_audio_size = 5 # audio of size 2 second needs to be padded to 5 seconds
audio, sf = load(file_path, sr=sf, mono=True) # mono=True converts stereo audio to mono
padded_audio = fix_length(audio, size=5*sf) # array size is required_audio_size*sampling frequency


print('Array length before padding', np.shape(audio))
print('Audio length before padding in seconds', (np.shape(audio)[0]/fs))
print('Array length after padding', np.shape(padded_audio))
print('Audio length after padding in seconds', (np.shape(padded_audio)[0]/fs))

Output:

Array length before padding (88200,)
Audio length before padding in seconds 2.0
Array length after padding (220500,)
Audio length after padding in seconds 5.0

Although after looking through a number of similar questions, it seems like pydub.AudioSegment is the go to solution.

Maria
  • 171
  • 1
  • 4