0

I need to convert mp3 to wav files that will then be used by Python script. (The script analyzes the wav file data in the form of numpy arrays for each channel.) That all works fine. To get the wav, I first used Audacity "by hand" and opened the mp3, then used "Export as WAV". All good, but, I now need to automate this and avoid using Audacity. So I modified my Python script to first run ffmpeg to do the conversion, using:

subprocess.call(["ffmpeg -i", mp3filename, wavfilename])

Looking at the resulting wavfilename.wav data, I find it is totally different from what is in the Audacity-produced file. I verified that both approaches are using PCM 16-bit unsigned little endian, same sample rate, etc.

Now, if I take that ffmpeg-produced wav file, open it with Audacity, and then "export as WAV", it will produce a 3rd wav that works just fine, and looks identical to the one that I created with Audacity in the first place - and when I say identical, I mean that I compare the numpy arrays element by element, and they line right up, whereas the ffmpeg data does not correlate to the Audacity data at all.

I am using the very latest ffmpeg, ffmpeg version N-99557-g6bdfea8d4b with Lavf58.62.100. Audacity is 2.3.2. I have tried using pydub also but that gives the same results as ffmpeg.

Clearly, Audacity is doing something differently when it exports the wav, and even though the ffmpeg data looks very different in Python, Audacity plays it just fine - so it is also able to correctly account for whatever it is that ffmpeg did when it made the wav file. I would simply like to get ffmpeg to emulate what Audacity does. Any advice or insight is greatly appreciated.

Brad
  • 159,648
  • 54
  • 349
  • 530
  • 1
    Surely the audio data is identical and you're looking at metadata that may have been added to the WAV file. – Brad Oct 15 '20 at 23:53
  • 1
    Run the [hash muxer](https://ffmpeg.org/ffmpeg-formats.html#hash) on each input and see if you get the same result: `ffmpeg -v error -i input.wav -f hash -` – llogan Oct 16 '20 at 01:22
  • @Brad Thanks for the response. I have long suspected this was the issue. For a small mp3 file that I have used as a test case, the wav file created by ffmpeg gives a numpy array of length 57339 for each channel, while the Audacity-built wav produced arrays of length 59904 in each channel - 2565 additional integers.But even if I try shifting the longer array relative to the smaller, they do not match up. It is not clear, and I can find no documentation, as to how the metadata finds its way into these numpy arrays and how I can remove it. – MJLerxst Oct 16 '20 at 01:33
  • I don't use Python... I don't know what a Numpy array is, but if you post a couple WAV file samples, I can take a look. – Brad Oct 16 '20 at 01:40
  • @ llogan. I did this and got different results: PS C:\audio> ffmpeg -v error -i ffmpeg.wav -f hash - SHA256=37cb63c1a23bbe1303584218323344fd039dcf2658d631e11d6fe420a1a6a935 PS C:\audio> ffmpeg -v error -i audacity.wav -f hash - SHA256=a3ed415166d90bf8f0a2d1fc3cbd240a827fd59c0cfaaabd2dd4c1b95ac684b3 – MJLerxst Oct 16 '20 at 01:41
  • @ Brad. Thank you. The following are about 1 sec long samples created from the same mp3, but with the two methods. https://drive.google.com/file/d/1ucZbu-MymJB73Zw57JHrbDI2XQoLQPCl/view?usp=sharing and https://drive.google.com/file/d/1wJCbJkFNamCqY6PibFZIsgCaBcQjZkYx/view?usp=sharing – MJLerxst Oct 16 '20 at 01:54
  • After more searching, I found a thread that perhaps documents the same issue: https://stackoverflow.com/questions/48923943/why-extract-sample-from-wav-file-using-udacity-gui-and-scipy-give-different-valu?rq=1. This would indicate that Audacity coverts to 32bit but then upon export dithers back down to 16, a step that ffmpeg is not taking during conversion. A reasonable explanation? Thanks – MJLerxst Oct 16 '20 at 12:28
  • @MJLerxst I didn't try, but it seems like a good possibility. Not the same issue exactly, but it reminds me of [this old answer](https://stackoverflow.com/a/13694891/). – llogan Oct 16 '20 at 18:05

0 Answers0