How to save/load tensors to speed up training?

Question

I wonder if it is possible, to save and load tensors in tensorflow.js in order to avoid recalculating them for each batch? The problem is that my gpu is barely used because it has to wait for cpu transforming my array to tensor, before the training.

my worflow now looks like this:

loading dataset(reading from hdd to array) (1-2 seconds)

2.cpu transforming array to tensor (takes a long time)

3.gpu trains (takes 1 second or less)

unloading / tidy (5 seconds, also a bit too long)
repeat

EDIT: Here is some code with the problematic(means long heavy computation) and unproblematic lines commented:

async function learn_on(ep){

    for (var learn_ep = ep+1; learn_ep <= 1200; learn_ep++) {
        var batch_start = 0;

        var mini_batch_in = [];
        var mini_batch_out = [];

        var shuffle_arr=[];
        for(var i=0;i<in_tensor_sum.length;i++){
            shuffle_arr.push(i); // needs no time
        }

        shuffle_arr=F_shuffle_array(shuffle_arr); // needs no time

        // in_tensor_sum / out_tensor_sum is just an 2 dimensional array = data_set number , data points 
        for (var batch_num = batch_start; batch_num < in_tensor_sum.length; batch_num++) {

            mini_batch_in.push(in_tensor_sum[shuffle_arr[batch_num]]); // very fast also
            mini_batch_out.push(out_tensor_sum[shuffle_arr[batch_num]]);// very fast also

            if (batch_num + 1 == batch_start + 250 || batch_num == in_tensor_sum.length - 1) {
                //possible to import/export xs/ys?????
                var xs = tf.tensor(mini_batch_in); //here CPU heavy computation!!!!!!!!!!!!!!!! TAKES LONG TIME 9600 input units here
                var ys = tf.tensor(mini_batch_out); // and here CPU heavy computation!!!!!!!! TAKES not so Long time, but this is because of small output size just 400

                // GPU ACCELARATION starts here Super fast only one second! This rocks!!!
                await model.fit(xs, ys, {
                    epochs: 1, shuffle: true,
                    callbacks: {
                        onEpochEnd: async (epoch, log) => {
                            console.log(`${batch_num}:|Epoch ${learn_ep}: | set: ${batch_num / in_tensor_sum.length} | loss = ${log.loss}`);                          
                        },
                        onTrainEnd: async () => {

                        }
                    }
                });
                //avoid memory leaks START (ALSO TAKES a little time!!!!)
                await tf.tidy(() => {
                    tf.tensor([xs, ys]);
                    console.log('numTensors (inside tidy): ' + tf.memory().numTensors);
                });

                console.log('numTensors (outside tidy): ' + tf.memory().numTensors);
                xs.dispose();
                ys.dispose();
                console.log('numTensors (after dispose): ' + tf.memory().numTensors);

                batch_start = batch_num + 1;
                mini_batch_in = [];
                mini_batch_out = [];
                //avoid memory leaks END

            }


        }

    }
}

EDIT 2:

I have now tried to use 'tfjs-npy' to save and load the tensor.But I get an error:

.
.
.
var xs = await tf.tensor(mini_batch_in);
var ys = await tf.tensor(mini_batch_out);

var fs = require('fs');            
var tf_parser= require  ('tfjs-npy');


var writeTO=await tf_parser.serialize(ys);
await fs.writeFileSync('/home/test/NetBeansProjects/ispeed_tensload/save_tensors/test.js',new Buffer(writeTO));

var tensor_data =await fs.readFileSync("/home/test/NetBeansProjects/ispeed_tensload/save_tensors/test.js");
var my_arrayBuffer = new Uint8Array(tensor_data).buffer;
var ys2=await tf_parser.parse(my_arrayBuffer);


await model.fit(xs, ys2, {....

The error:

(node:26576) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'values' of undefined
    at NodeJSKernelBackend.getInputTensorIds (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-node/dist/nodejs_kernel_backend.js:142:26)
    at NodeJSKernelBackend.executeSingleOutput (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-node/dist/nodejs_kernel_backend.js:186:73)
    at NodeJSKernelBackend.gather (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-node/dist/nodejs_kernel_backend.js:965:21)
    at environment_1.ENV.engine.runKernel.$x (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/ops/segment_ops.js:56:84)
    at /home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/engine.js:129:26
    at Engine.scopedRun (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/engine.js:101:23)
    at Engine.runKernel (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/engine.js:127:14)
    at gather_ (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/ops/segment_ops.js:56:38)
    at Object.gather (/home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-core/dist/ops/operation.js:23:29)
    at /home/test/NetBeansProjects/ispeed_tensload/node_modules/@tensorflow/tfjs-layers/dist/backend/tfjs_backend.js:275:20

I guess there is a mismatch in the format that 'tfjs-npy' produces. But I don't know. Another acceptable solution would be to let the tensor creating process run on multiple threads(c++ back-end optimized) while the GPU is training, to reduce the idle time to a minimum. But I don't know if this is possible. The creating process now runs single threaded only in the node.js process, which has a very weak performance.

In the steps you listed above, why do you have to repeat step 1? Why can't the reading from HDD happen just once? — Shanqing Cai, Feb 25 '19 at 02:50
because it would not fit into ram/( node.js doesn't allow too big arrays),so I have to read the full data set step by step. But I would rather read the fully prepared tensor. I think the calculated tensor needs like 4x-5x times the size of the plain array. But reading is faster than calculating it. — user3776738, Feb 25 '19 at 08:36
It would also help if just could use multithreading for the tensor creating process, even while the gpu is doing the training, to minimize the GPU idle status. — user3776738, Feb 25 '19 at 17:18

edkeveked · Accepted Answer · 2019-02-26T05:21:04.730

1

The memory used by nodejs can be increased with the flag --max-old-space-size as indicated here. There is neither an issue with nodejs nor tensorflow.js regarding that. The only problem might be the capacity of your memory. This might be the only reason for going forth and back to read your data.

Having said that, it is unclear what it is being done here:

 await tf.tidy(() => {
                    tf.tensor([xs, ys]);
                    console.log('numTensors (inside tidy): ' + tf.memory().numTensors);
                });

It is useless because:

The tensor is created and disposed off.
xs and ys being not array-like tf.tensor([xs, ys]) will create a tensor of 2 NaN values. It does not have any influence on the performance of the code.

The tensor xs and ys are effectively disposed off respectively with xs.dispose() and ys.dispose()

edited Feb 26 '19 at 05:21

answered Feb 25 '19 at 19:33

edkeveked

17,989
10
55
93

Without the tf.tidy my 16 GB Ram just run full after a few loops,because the tf.memory().numTensors getting bigger and bigger without it.The creation process of the tensor is really slow, so that I can't use the full potential of my GPU. (It's just a 1060GTX with 6 GB VRAM, not even a Titan ;) Using multi threading on tfs-node back end would shorten the GPU idle time.But the (non gpu / c++) back-end doesn't run,when I use the gpu-back-end. Only the intern javascript/node.js (only 1 thread!!!) is used for creating the tensor. – user3776738 Feb 25 '19 at 19:56
1

@user3776738, `tf.tensor([xs, ys]);` is created but not used. You could consider removing all the code inside `tf.tidy`. Loading the data into the memory and the gpu is not cost-free. The more data there is, the longer it will take . But once it is loaded, you can do what you want with the backend used. – edkeveked Feb 25 '19 at 21:20
But,when I remove it, nothing is done inside of tidy?! And then the ram fills up and crash. Why is ram to vram and back so slow. Ram works with many GB/S while vram even works with many tens(hundreds?) of GB/s.The problem is the calculation from array to tensor.This needs to be pre-proccessed for the whole data set(many GBs) or at least accelerated by multi threading to avoid too much idle time of the gpu. – user3776738 Feb 25 '19 at 22:41
1

@user3776738 How big is your tensor and how long does it take to load ? – edkeveked Feb 26 '19 at 05:20
the input tensor has 9600 units and the output has 400, multiplied with 250 data points (batch size) – user3776738 Feb 26 '19 at 08:16
*2500 data points| And it takes 5-6 Seconds to load both(input and output). – user3776738 Feb 26 '19 at 08:32
1

It turns out that I'm just super stupid: All I have to do was just to create tensor after tensor into an array, and than use it to train on it.I just had some memory issues in the past but maybe I was just reading and concatenating everything on everything,which killed my ram.This was why I didn't considered to do it this way,because I thought that tensors are somehow 8x or even more bigger than the input array. – user3776738 Feb 26 '19 at 19:51
1

It happens even to the best of us. Happy coding :) – edkeveked Feb 26 '19 at 19:54

How to save/load tensors to speed up training?

1 Answers1