2

There is a clear memory leak in my code that causes my used memory to go from 5gb to 15.7gb in a span of 40-60 seconds, then crashes my program with an OOM error. I believe this happens when I am creating tensors to form the dataset and not when I am training the model. My data consists of 25,000 images stored locally. As such, I used the built-in tensorflow.js function tf.data.generator(generator) described here to create the dataset. I believe this is the best and most efficient way to create a large dataset as mentioned here.

Example

I used a helper class to create my dataset by passing in the path to the images

class Dataset{

    constructor(dirPath){
        this.paths = this.#generatePaths(dirPath);
    }

    // Generate file paths for all images to be read as buffer
    #generatePaths = (dirPath) => {
        const dir = fs.readdirSync(dirPath, {withFileTypes: true})
            .filter(dirent => dirent.isDirectory())
            .map(folder => folder.name)
        let imagePaths = [];
        dir.forEach(folder => {
            fs.readdirSync(path.join(dirPath, folder)).filter(file => {
                return path.extname(file).toLocaleLowerCase() === '.jpg'
            }).forEach(file => {
                imagePaths.push(path.resolve(path.join(dirPath, folder, file)))
            })
        })
        return imagePaths;
    }

    // Convert image buffer to a Tensor object
    #generateTensor = (imagePath) => {
        const buffer = fs.readFileSync(imagePath);
        return tf.node.decodeJpeg(buffer, 3)
            .resizeNearestNeighbor([128, 128])
            .toFloat()
            .div(tf.scalar(255.0))
    }

    // Label the data with the corresponding class
    #labelArray(index){return Array.from({length: 2}, (_, k) => k === index ? 1 : 0)};

    // Javascript generator function passed to tf.data.generator()
    * #imageGenerator(){
        for(let i=0; i<this.paths.length; ++i){
            let image;
            try {
                image = this.#generateTensor(this.paths[i]);
            } catch (error) {
                continue;
            }
            console.log(tf.memory());
            yield image;
        }
    }

    // Javascript generator function passed to tf.data.generator()
    * #labelGenerator(){
        for(let i=0; i<this.paths.length; ++i){
            const classIndex = (path.basename(path.dirname(this.paths[i])) === 'Cat' ? 0 : 1);
            const label = tf.tensor1d(this.#labelArray(classIndex), 'int32')
            console.log(tf.memory());
            yield label;
        }
    }

    // Load data
    loadData = () => {
        console.log('\n\nLoading data...')
        const xs = tf.data.generator(this.#imageGenerator.bind(this));
        const ys = tf.data.generator(this.#labelGenerator.bind(this));
        const ds = tf.data.zip({xs, ys}).batch(32).shuffle(32);
        return ds;
    }
}

And I am creating my dataset like this:

const trainDS = new dataset(trainPath).loadData();

Question

I am aware of built-in tfjs methods to manage memory such as tf.tidy() and tf.dispose(). However, I was unable to implement them in such a way to stop the memory leak, as the tensors are generated by the tf.data.generator function.

How would I go about successfully disposing the tensors from memory after they are yielded by the generators?

HyperVS
  • 23
  • 3

1 Answers1

2

Every tensor you create, you need to dispose of - there is no garbage collection as you're used to in JS. That's because tensors are not kept in JS memory (they can be in GPU memory or WASM module, etc.), so JS engine cannot track them. They are more like pointers than normal variables.

For example, in your code:

        return tf.node.decodeJpeg(buffer, 3)
            .resizeNearestNeighbor([128, 128])
            .toFloat()
            .div(tf.scalar(255.0))

each chained operation creates interim tensor that never gets disposed
read it this way:

const decoded = tf.node.decodeJpeg(buffer, 3)
const resized = decoded.resizeNearestNeighbor([128, 128])
const casted = resized.toFloat();
const normalized = casted.div(tf.scalar(255.0))
return normalized;

so you have 4 large tensors allocated somewhere
what you're missing is

tf.dispose([decoded, resized, casted]);

and later when youre done with the image, also tf.dispose(image) which disposes normalized

and same regarding everything that is a tensor.

I am aware of built-in tfjs methods to manage memory such as tf.tidy() and tf.dispose(). However, I was unable to implement them in such a way to stop the memory leak, as the tensors are generated by the tf.data.generator function.

you say you're aware, but you're doing the exactly the same thing by creating interim tensors you're not disposing.

you can help yourself by wrapping such functions in a tf.tidy() that creates a local scope so everything that is not returned gets automatically released.

for example:

   #generateTensor = tf.tidy(imagePath) => {
        const buffer = fs.readFileSync(imagePath);
        return tf.node.decodeJpeg(buffer, 3)
            .resizeNearestNeighbor([128, 128])
            .toFloat()
            .div(tf.scalar(255.0))
    }

which means interim tensors will get disposed of, but you still need to dispose the return value once youre done with it

Vladimir Mandic
  • 813
  • 5
  • 11
  • Yeah, wrapping the generate tensor in a tf.tidy() solved my problem. Thanks for the thorough answer and explanation. – HyperVS Dec 07 '21 at 18:28