2

I have the following simple code:

import tensorflow as tf
import numpy as np

filename = # a list of wav filenames   
x = tf.placeholder(tf.string)

def mfcc(x):
    feature = # some function written in NumPy to convert a wav file to MFCC features
    return feature

mfcc_fn = lambda x: mfcc(x)

# create a training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x))
train_dataset = train_dataset.repeat()
train_dataset = train_dataset.map(mfcc_fn)
train_dataset = train_dataset.batch(100)
train_dataset = train_dataset.prefetch(buffer_size=1)

# create an iterator and iterate over training dataset
iterator = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)
train_iterator = iterator.make_initializer(train_dataset)

with tf.Session() as sess:
    sess.run(train_iterator, feed_dict={x: filename})

Basically, the code creates a tf.data.dataset object which loads a wav file and converts it to mfcc feature. Here, the data conversion happens at train_dataset.map(mfcc_fn) at which I apply an mfcc function written in NumPy to all input data.

Apparently, the code doesn't work here because NumPy doesn't support operations on tf.placeholder object. Is it possible map a function to input to tf.data.dataset if I have to write the function in NumPy? The reason I don't use TensorFlow's buit-in MFCC feature transformation is because the FFT function in TensorFlow gives significantly different output than its NumPy counterpart(as illustraded here), and the model I am building is prone to MFCC features generated using NumPy.

Steven Chan
  • 473
  • 4
  • 19

2 Answers2

4

You can achieve that with the tf.py_func function, or tf.py_function (which is the newer version). It does exactly what you want, it will wrap your numpy function that operates on arrays in a tensorflow operation that you can include as part of your dataset graph.

Anis
  • 2,984
  • 17
  • 21
  • The method does work for converting a numpy array to another, but it doesn't seem to work if the input is string(filenames). Is there another method? I can only load filenames rather than all audio data at the very beginning otherwise there will be an OOM error. – Steven Chan Apr 23 '19 at 09:17
  • It works with any kind of inputs. It takes what the dataset provides meaning that if you want the inputs to be filename, you should design your dataset to generate filenames. Datasets don't need to work with arrays. They work with anything that can be represented as a tensor, and strings can. – Anis Apr 23 '19 at 09:49
  • The dataset works with any kind of inputs but `tf.py_func` doesn't. If my inputs to the dataset are filenames then the function will be mapped to every filename which has type of string, which results in an error because `tf.py_func` expects the input argument to be numpy arrays. – Steven Chan Apr 23 '19 at 09:58
  • 1
    Nevermind I figured it out. All I need to do is just decode the files into arrays using tensorflow's built-in functions. Thanks for the answer! – Steven Chan Apr 23 '19 at 10:19
  • Since the title says *numpy*, it would be nice to include the designated [`numpy_function`](https://www.tensorflow.org/api_docs/python/tf/numpy_function) in the answer. – Leander Aug 02 '19 at 15:31
1

You can use a python generator to handle the numpy array and then pass that to tf.data.Dataset.from_generator

For eg.

def sample_generator(image_paths):
    for image_path in image_paths:
        img = cv2.imread(image_path)
        # Do all the custom numpy things
    
        yield img

data_loader = tf.data.Dataset.from_generator(sample_generator,
                                             args=[image_paths],
                                             output_types=tf.int32,
                                             output_shapes=((None, None, 3))

This will create a TensorFlow data loader from the python generator. You can read more about this here.