2

How can I convert MATLAB's quantiz function (where xd is the desimated signal) into python/scipy?

I am trying to implement an algorithm that I have developed in MATLAB for speech processing in to a software package using python and libraries like scipy, numpy, pygtk, and matplotlib in order to convert the algorithm into a complete package.

I am using scipy for algorithm development, but I can't find an appropriate function to "quantize the signal" in python:

[I,xq] = quantiz(xd,1:step:1-step, -1:step:1);

How would I write this in python?

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • Voice-to-speech recognition is an *incredibly* broad field. There are whole companies that dedicate billions in capital to this alone. Unless you can provide some concrete implementations/parts where you're stuck, wait a while until you ask this question. It's just too broad. – Makoto Apr 13 '13 at 04:11
  • @Makoto: I think they did specify a concrete part where they're stuck: they need an equivalent to the MATLAB `quantiz` function. – icktoofay Apr 13 '13 at 04:39

1 Answers1

5

Looking at the documentation, it's a pretty simple function, which is pretty easy to write in Python. I renamed the function to add the missing ‘e’ because it was annoying me. Anyway:

def quantize(signal, partitions, codebook):
    indices = []
    quanta = []
    for datum in signal:
        index = 0
        while index < len(partitions) and datum > partitions[index]:
            index += 1
        indices.append(index)
        quanta.append(codebook[index])
    return indices, quanta

Trying it with the example in the documentation:

>>> index, quants = quantize([3, 34, 84, 40, 23], range(10, 90, 10), range(10, 100, 10))
>>> index
[0, 3, 8, 3, 2]
>>> quants
[10, 40, 90, 40, 30]

For a slightly more efficient but less flexible version, we can bypass the ranges and just use mathematics:

from __future__ import division
import math

def opt_quantize(signal, num_quanta, partition_start, partition_step,
                 codebook_start, codebook_step):
    indices = []
    quanta = []
    for datum in signal:
        index = int(math.floor((datum - partition_start) / partition_step + 1))
        if index < 0:
            index = 0
        if index >= num_quanta:
            index = num_quanta - 1
        indices.append(index)
        quanta.append(codebook_start + codebook_step * index)
    return indices, quanta

Trying it with the example in the documentation:

>>> index, quants = opt_quantize([3, 34, 84, 40, 23], 9, 10, 10, 10, 10)
>>> index
[0, 3, 8, 4, 2]
>>> quants
[10, 40, 90, 50, 30]

So the results are a tiny bit difference in the case where a datum is exactly on a partition due to floating-point error, but it works if nothing's on a partition.


So that reduces the running time, where n is the length of the signal and m is the number of partitions from O(mn) to O(n). That should give you a significant performance boost. Can we do better?

Yes. With our new mathematics-based approach, the code is easily vectorized, and we can make Numpy do the hard work:

import numpy as np

def np_quantize(signal, num_quanta, partition_start, partition_step,
                codebook_start, codebook_step):
    signal = np.asarray(signal, dtype=float)
    indices = np.empty_like(signal, dtype=int)
    np.floor_divide((signal - partition_start + partition_step), \
                    partition_step, indices)
    np.clip(indices, 0, num_quanta - 1, indices)
    quanta = np.asarray(indices, dtype=float) * codebook_step + codebook_start
    return indices, quanta

I did, by chance, benchmark it, and it appears that each of my optimizations has made it slower, so either I'm doing something horribly wrong or I'm not testing on data large enough to amortize the constant.

~$ python -m timeit -s 'from quantize import orig_quantize' 'orig_quantize([-3, -2, -1, 0, 1, 2, 3], [-0.5, 0.5], [-1, 0, 1])'
100000 loops, best of 3: 8.58 usec per loop
~$ python -m timeit -s 'from quantize import opt_quantize' 'opt_quantize([-3, -2, -1, 0, 1, 2, 3], 3, -0.5, 1, -1, 1)'
100000 loops, best of 3: 10.8 usec per loop
~$ python -m timeit -s 'from quantize import np_quantize' 'np_quantize([-3, -2, -1, 0, 1, 2, 3], 3, -0.5, 1, -1, 1)'
10000 loops, best of 3: 57.4 usec per loop

For kicks, I tried using Cython as well as Numpy:

cimport cython
cimport numpy as np

cdef extern from "math.h":
    float floorf(float)


@cython.boundscheck(False)
def cynp_quantize(np.ndarray[float, ndim=1] signal, int num_quanta,
                  float partition_start, float partition_step,
                  float codebook_start, float codebook_step):
    cdef int i
    cdef int index
    cdef np.ndarray[np.int_t, ndim=1] indices = np.empty_like(signal, dtype=int)
    cdef np.ndarray[float, ndim=1] quanta = np.empty_like(signal)
    for i in range(signal.shape[0]):
        index = <int>floorf((signal[i] - partition_start)
                            / partition_step + 1.0)
        if index < 0:
            index = 0
        if index >= num_quanta:
            index = num_quanta - 1
        indices[i] = index
        quanta[i] = codebook_start + index * codebook_step
    return indices, quanta

From what I gather, Cython also experimentally supports OpenMP, which would let it do everything with multiple threads. I was unable to test the performance of this Cython solution, though, with or without threads (I'm missing a header file needed to compile the result).

icktoofay
  • 126,289
  • 21
  • 250
  • 231
  • thanks a lot icktoofay but i am bit confused with these entities " 1:step:1-step, -1:step:1". as its a inbuilt function in Matlab i dont exactly know how it execute on the signal. so though its just a function it has a inbuilt logic that i am finding it difficult to implement. – Mahesh Galgalikar Apr 13 '13 at 04:59
  • @MaheshGalgalikar: That's how you do ranges in MATLAB. Roughly, you can translate `a:b:c` to [`range(a, c, b)`](http://docs.python.org/3.3/library/stdtypes.html#typesseq-range). – icktoofay Apr 13 '13 at 05:00
  • in quantiz function which i am using in matlab, takes step=0.0078 but in python it will not accept float value. all values must be integer.. – Mahesh Galgalikar Apr 13 '13 at 05:39
  • @MaheshGalgalikar: See [this question](http://stackoverflow.com/q/7267226). If you the function in the first answer, you'll probably need to use `list(frange(...))` rather than just `range(...)`. The Numpy `arange` should not need any special `list` stuff. – icktoofay Apr 13 '13 at 05:40
  • @ icktoofay : Thanks a lot will let you know the progress – Mahesh Galgalikar Apr 13 '13 at 05:49
  • @ icktoofay: thanks a lot function is running properly but its taking a long time for execution... In matlab, the range i have specified and the step size that i have used produces 7,20,000 values. same it is doing with the python but now the exection time is say 1000 times more. is there any way to reduce it ? i have to process a file and compare with the results of 21 files stored within the database. is there any way that the process could be parallelized ? – Mahesh Galgalikar Apr 14 '13 at 14:04
  • @MaheshGalgalikar: Well, for simple ranges, there are a few optimizations that can be done. These optimizations turn out to allow vectorization quite easily, although I'm not sure if Numpy would take advantage of it. I've added these to my answer. If you want to run a bunch of quantizations in parallel, you may want to look into [the `multiprocessing` module](http://docs.python.org/2/library/multiprocessing.html). Furthermore, if you want to distribute across machines, then there's ZeroMQ among other solutions… – icktoofay Apr 14 '13 at 20:53