There is a clear memory leak in my code that causes my used memory to go from 5gb to 15.7gb in a span of 40-60 seconds, then crashes my program with an OOM error. I believe this happens when I am creating tensors to form the dataset and not when I am training the model. My data consists of 25,000 images stored locally. As such, I used the built-in tensorflow.js function tf.data.generator(generator) described here to create the dataset. I believe this is the best and most efficient way to create a large dataset as mentioned here.
Example
I used a helper class to create my dataset by passing in the path to the images
class Dataset{
constructor(dirPath){
this.paths = this.#generatePaths(dirPath);
}
// Generate file paths for all images to be read as buffer
#generatePaths = (dirPath) => {
const dir = fs.readdirSync(dirPath, {withFileTypes: true})
.filter(dirent => dirent.isDirectory())
.map(folder => folder.name)
let imagePaths = [];
dir.forEach(folder => {
fs.readdirSync(path.join(dirPath, folder)).filter(file => {
return path.extname(file).toLocaleLowerCase() === '.jpg'
}).forEach(file => {
imagePaths.push(path.resolve(path.join(dirPath, folder, file)))
})
})
return imagePaths;
}
// Convert image buffer to a Tensor object
#generateTensor = (imagePath) => {
const buffer = fs.readFileSync(imagePath);
return tf.node.decodeJpeg(buffer, 3)
.resizeNearestNeighbor([128, 128])
.toFloat()
.div(tf.scalar(255.0))
}
// Label the data with the corresponding class
#labelArray(index){return Array.from({length: 2}, (_, k) => k === index ? 1 : 0)};
// Javascript generator function passed to tf.data.generator()
* #imageGenerator(){
for(let i=0; i<this.paths.length; ++i){
let image;
try {
image = this.#generateTensor(this.paths[i]);
} catch (error) {
continue;
}
console.log(tf.memory());
yield image;
}
}
// Javascript generator function passed to tf.data.generator()
* #labelGenerator(){
for(let i=0; i<this.paths.length; ++i){
const classIndex = (path.basename(path.dirname(this.paths[i])) === 'Cat' ? 0 : 1);
const label = tf.tensor1d(this.#labelArray(classIndex), 'int32')
console.log(tf.memory());
yield label;
}
}
// Load data
loadData = () => {
console.log('\n\nLoading data...')
const xs = tf.data.generator(this.#imageGenerator.bind(this));
const ys = tf.data.generator(this.#labelGenerator.bind(this));
const ds = tf.data.zip({xs, ys}).batch(32).shuffle(32);
return ds;
}
}
And I am creating my dataset like this:
const trainDS = new dataset(trainPath).loadData();
Question
I am aware of built-in tfjs methods to manage memory such as tf.tidy() and tf.dispose(). However, I was unable to implement them in such a way to stop the memory leak, as the tensors are generated by the tf.data.generator function.
How would I go about successfully disposing the tensors from memory after they are yielded by the generators?