0

I have a large bytes object (raw data from a 16-bit WAVE file with about 8 million samples) that I need to convert to a list of integers to do some processing on. So far I used list comprehension and int.from_bytes for the conversion, but I have noticed it is taking a considerable amount of time. I am wondering whether there is a faster solution.

Here is my current method:

data = [int.from_bytes(raw[i * sampwidth:((i + 1) * sampwidth)], "little", signed=True) for i in range(len(raw) // sampwidth)]

On my machine this method is taking about 9 seconds per file (I have multiple files) on a single core, and I would like to know whether I am pushing Python's limits, or whether there exists a more optimal method.

Cosinux
  • 321
  • 1
  • 4
  • 16

2 Answers2

2

If you can use scipy (which has a lot of other nice signal processing functions) you can use scipy.io.wavefile.read

import scipy.io.wavfile
rate, data_np_ary = scipy.io.wavfile.read('example.wav')
howderek
  • 2,224
  • 14
  • 23
  • Thanks. I will take a look at SciPy, although I feel like it might be a bit of an overkill for the simple task that I am trying to solve. Nonetheless I am happy to learn about a single function solution to loading a WAVE file into a NumPy array which seems to be the solution for my performance issues. – Cosinux Aug 29 '19 at 02:11
0

It seems like NumPy really is the way to go. It has managed to load all 12 WAVE files (and do a simple stereo to mono conversion) in just over a second. The code is also more elegant. The only downside of this method is that it only supports 1, 2, 4, and 8-byte integers, but since I am dealing with audio data, this will not be an issue.

The new NumPy solution:

data = numpy.frombuffer(raw, numpy.int16)
Cosinux
  • 321
  • 1
  • 4
  • 16