34

I want to make a image classifier, but I don't know python. Tensorflow.js works with javascript, which I am familiar with. Can models be trained with it and what would be the steps to do so? Frankly I have no clue where to start.

The only thing I figured out is how to load "mobilenet", which apparently is a set of pre-trained models, and classify images with it:

const tf = require('@tensorflow/tfjs'),
      mobilenet = require('@tensorflow-models/mobilenet'),
      tfnode = require('@tensorflow/tfjs-node'),
      fs = require('fs-extra');

const imageBuffer = await fs.readFile(......),
      tfimage = tfnode.node.decodeImage(imageBuffer),
      mobilenetModel = await mobilenet.load();  

const results = await mobilenetModel.classify(tfimage);

which works, but it's no use to me because I want to train my own model using my images with labels that I create.

=======================

Say I have a bunch of images and labels. How do I use them to train a model?

const myData = JSON.parse(await fs.readFile('files.json'));

for(const data of myData){
  const image = await fs.readFile(data.imagePath),
        labels = data.labels;

  // how to train, where to pass image and labels ?

}
Alex
  • 66,732
  • 177
  • 439
  • 641
  • where are you facing the problem. if you have loaded tensorflow, you can train your own model – Abhishek Anand Nov 20 '19 at 11:45
  • 2
    It seems like you can train models with tensorflow.js https://www.tensorflow.org/js/guide/train_models I used TensorFlow with python. If TensorFlow.js is not using GPU, training might take a long time. For me, https://colab.research.google.com/ was a useful resource because it is free and provides 11 GB of GPU. – canbax Nov 20 '19 at 11:45
  • 1
    This is too broad a question... As pointed out in [the docs](https://www.tensorflow.org/js/tutorials), you can use [ml5](https://ml5js.org/) to [train](https://learn.ml5js.org/docs/#/reference/neural-network?id=train) a model or use TF.js directly, like in [this Node.js example](https://www.tensorflow.org/js/tutorials/setup#see-sample-code-for-node.js-usage) (expand sample code to see a training example). – jdehesa Nov 20 '19 at 11:46
  • But I don't see anywhere in that code how to pass the images and labels? – Alex Nov 20 '19 at 11:47
  • @Alex They are passed to the [`fit`](https://js.tensorflow.org/api/latest/#tf.LayersModel.fit) method, or in the dataset passed to [`fitDataset`](https://js.tensorflow.org/api/latest/#tf.LayersModel.fitDataset), as shown in the examples. – jdehesa Nov 20 '19 at 12:14
  • So `xs` would be my image data and `ys` the labels? – Alex Nov 20 '19 at 12:40
  • @Alex That's right, check out the linked documentation of the different methods. – jdehesa Nov 20 '19 at 14:31
  • You keep driving me to look at the doc, but I did and it's not obvious at all. I don't even understand how does tf differentiate between data, like how does it know what's an image and what's plain text... – Alex Nov 20 '19 at 15:09
  • For the point of view of TF it is quite much the same if you train with text or images. The logic of identifying is the same, only the tensors representing the `xs` are different. – mico Dec 12 '19 at 17:15
  • Hey @Alex i have same requirement i have to trained my custom model..if you did it so can you please share github link or any blog link if you have any – Dexter Feb 26 '20 at 04:09
  • Hi Dexter, I still haven't figured it out and am still working on it :( – Alex Mar 01 '20 at 10:08

4 Answers4

30

First of all, the images needs to be converted to tensors. The first approach would be to create a tensor containing all the features (respectively a tensor containing all the labels). This should the way to go only if the dataset contains few images.

  const imageBuffer = await fs.readFile(feature_file);
  tensorFeature = tfnode.node.decodeImage(imageBuffer) // create a tensor for the image

  // create an array of all the features
  // by iterating over all the images
  tensorFeatures = tf.stack([tensorFeature, tensorFeature2, tensorFeature3])

The labels would be an array indicating the type of each image

 labelArray = [0, 1, 2] // maybe 0 for dog, 1 for cat and 2 for birds

One needs now to create a hot encoding of the labels

 tensorLabels = tf.oneHot(tf.tensor1d(labelArray, 'int32'), 3);

Once there is the tensors, one would need to create the model for training. Here is a simple model.

const model = tf.sequential();
model.add(tf.layers.conv2d({
  inputShape: [height, width, numberOfChannels], // numberOfChannels = 3 for colorful images and one otherwise
  filters: 32,
  kernelSize: 3,
  activation: 'relu',
}));
model.add(tf.layers.flatten());
model.add(tf.layers.dense({units: 3, activation: 'softmax'}));

Then the model can be trained

model.fit(tensorFeatures, tensorLabels)

If the dataset contains a lot of images, one would need to create a tfDataset instead. This answer discusses why.

const genFeatureTensor = image => {
      const imageBuffer = await fs.readFile(feature_file);
      return tfnode.node.decodeImage(imageBuffer)
}

const labelArray = indice => Array.from({length: numberOfClasses}, (_, k) => k === indice ? 1 : 0)

function* dataGenerator() {
  const numElements = numberOfImages;
  let index = 0;
  while (index < numFeatures) {
    const feature = genFeatureTensor(imagePath);
    const label = tf.tensor1d(labelArray(classImageIndex))
    index++;
    yield {xs: feature, ys: label};
  }
}

const ds = tf.data.generator(dataGenerator).batch(1) // specify an appropriate batchsize;

And use model.fitDataset(ds) to train the model


The above is for training in nodejs. To do such a processing in the browser, genFeatureTensor can be written as follow:

function loadImage(url){
  return new Promise((resolve, reject) => {
    const im = new Image()
        im.crossOrigin = 'anonymous'
        im.src = 'url'
        im.onload = () => {
          resolve(im)
        }
   })
}

genFeatureTensor = image => {
  const img = await loadImage(image);
  return tf.browser.fromPixels(image);
}

One word of caution is that doing heavy processing might block the main thread in the browser. This is where web workers come into play.

edkeveked
  • 17,989
  • 10
  • 55
  • 93
  • the width and height from the inputShape must match the width and height of the images? So I can't pass images with different dimensions? – Alex Nov 25 '19 at 11:05
  • 1
    Yes they must match. If you have images of different width and height from the inputShape of the model, you will need to resize the image by using `tf.image.resizeBilinear` – edkeveked Nov 25 '19 at 11:14
  • Error: Operands could not be broadcast together with shapes 4,10 and 4,98,198,10. – Alex Dec 10 '19 at 17:39
  • 3
    @Alex Could you please update your question with the model summary and the shape of the image you are loading ? All the images need to have the same shape or the image would need to be resized for the training – edkeveked Dec 11 '19 at 11:58
  • Hi @edkeveked, can you provide an example custom object detector, I tried many examples, use are using some pretrained model as base model, so the image size is becoming an issue as I have small objects in a big image resizing them (for mobile net is 224pxx224px) is giving worst result, I want to train from scratch, I will be very grateful for any help – Pranoy Sarkar Dec 13 '19 at 10:55
  • @PranoySarkar, are you talking about object detection or image classification. Mobilenet is often used for image classification. If you think that using it is worsening the prediction because of the input shape, why not write your own model using convolutional layers ? But if you want a more specific answer, consider asking your own question – edkeveked Dec 13 '19 at 11:41
  • 1
    hi @edkeveked , I am talking about object detection , I have added a new question here please have a look https://stackoverflow.com/questions/59322382/how-to-train-a-custom-object-detector-from-scratch-in-tensorflow-js – Pranoy Sarkar Dec 13 '19 at 12:15
  • I am using tf.image.resizeBilinear to resize all images to the same dimensions, and still get that error – Alex Dec 15 '19 at 16:37
  • @Alex, You will need to add a flatten layer before the last layer. You can close this thread and open a new one with the model and the error you are getting – edkeveked Dec 16 '19 at 14:02
10

Consider the exemple https://codelabs.developers.google.com/codelabs/tfjs-training-classfication/#0

What they do is:

  • take a BIG png image (a vertical concatenation of images)
  • take some labels
  • build the dataset (data.js)

then train

The building of the dataset is as follows:

  1. images

The big image is divided into n vertical chunks. (n being chunkSize)

Consider a chunkSize of size 2.

Given the pixel matrix of image 1:

  1 2 3
  4 5 6

Given the pixel matrix of image 2 is

  7 8 9
  1 2 3

The resulting array would be 1 2 3 4 5 6 7 8 9 1 2 3 (the 1D concatenation somehow)

So basically at the end of the processing, you have a big buffer representing

[...Buffer(image1), ...Buffer(image2), ...Buffer(image3)]

  1. labels

That kind of formatting is done a lot for classification problems. Instead of classifying with a number, they take a boolean array. To predict 7 out of 10 classes we would consider [0,0,0,0,0,0,0,1,0,0] // 1 in 7e position, array 0-indexed

What you can do to get started

  • Take your image (and its associated label)
  • Load your image to the canvas
  • Extract its associated buffer
  • Concatenate all your image's buffer as a big buffer. That's it for xs.
  • Take all your associated labels, map them as a boolean array, and concatenate them.

Below, I subclass MNistData::load (the rest can be let as is (except in script.js where you need to instantiate your own class instead)

I still generate 28x28 images, write a digit on it, and get a perfect accuracy since I don't include noise or voluntarily wrong labelings.


import {MnistData} from './data.js'

const IMAGE_SIZE = 784;// actually 28*28...
const NUM_CLASSES = 10;
const NUM_DATASET_ELEMENTS = 5000;
const NUM_TRAIN_ELEMENTS = 4000;
const NUM_TEST_ELEMENTS = NUM_DATASET_ELEMENTS - NUM_TRAIN_ELEMENTS;


function makeImage (label, ctx) {
  ctx.fillStyle = 'black'
  ctx.fillRect(0, 0, 28, 28) // hardcoded, brrr
  ctx.fillStyle = 'white'
  ctx.fillText(label, 10, 20) // print a digit on the canvas
}

export class MyMnistData extends MnistData{
  async load() { 
    const canvas = document.createElement('canvas')
    canvas.width = 28
    canvas.height = 28
    let ctx = canvas.getContext('2d')
    ctx.font = ctx.font.replace(/\d+px/, '18px')
    let labels = new Uint8Array(NUM_DATASET_ELEMENTS*NUM_CLASSES)

    // in data.js, they use a batch of images (aka chunksize)
    // let's even remove it for simplification purpose
    const datasetBytesBuffer = new ArrayBuffer(NUM_DATASET_ELEMENTS * IMAGE_SIZE * 4);
    for (let i = 0; i < NUM_DATASET_ELEMENTS; i++) {

      const datasetBytesView = new Float32Array(
          datasetBytesBuffer, i * IMAGE_SIZE * 4, 
          IMAGE_SIZE);

      // BEGIN our handmade label + its associated image
      // notice that you could loadImage( images[i], datasetBytesView )
      // so you do them by bulk and synchronize after your promises after "forloop"
      const label = Math.floor(Math.random()*10)
      labels[i*NUM_CLASSES + label] = 1
      makeImage(label, ctx)
      const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);
      // END you should be able to load an image to canvas :)

      for (let j = 0; j < imageData.data.length / 4; j++) {
        // NOTE: you are storing a FLOAT of 4 bytes, in [0;1] even though you don't need it
        // We could make it with a uint8Array (assuming gray scale like we are) without scaling to 1/255
        // they probably did it so you can copy paste like me for color image afterwards...
        datasetBytesView[j] = imageData.data[j * 4] / 255;
      }
    }
    this.datasetImages = new Float32Array(datasetBytesBuffer);
    this.datasetLabels = labels

    //below is copy pasted
    this.trainIndices = tf.util.createShuffledIndices(NUM_TRAIN_ELEMENTS);
    this.testIndices = tf.util.createShuffledIndices(NUM_TEST_ELEMENTS);
    this.trainImages = this.datasetImages.slice(0, IMAGE_SIZE * NUM_TRAIN_ELEMENTS);
    this.testImages = this.datasetImages.slice(IMAGE_SIZE * NUM_TRAIN_ELEMENTS);
    this.trainLabels =
        this.datasetLabels.slice(0, NUM_CLASSES * NUM_TRAIN_ELEMENTS);// notice, each element is an array of size NUM_CLASSES
    this.testLabels =
        this.datasetLabels.slice(NUM_CLASSES * NUM_TRAIN_ELEMENTS);
  }

}
grodzi
  • 5,633
  • 1
  • 15
  • 15
8

I found a tutorial [1] how to use existing model to train new classes. Main code parts here:

index.html head:

   <script src="https://unpkg.com/@tensorflow-models/knn-classifier"></script>

index.html body:

    <button id="class-a">Add A</button>
    <button id="class-b">Add B</button>
    <button id="class-c">Add C</button>

index.js:

    const classifier = knnClassifier.create();

    ....

    // Reads an image from the webcam and associates it with a specific class
    // index.
    const addExample = async classId => {
           // Capture an image from the web camera.
           const img = await webcam.capture();

           // Get the intermediate activation of MobileNet 'conv_preds' and pass that
           // to the KNN classifier.
           const activation = net.infer(img, 'conv_preds');

           // Pass the intermediate activation to the classifier.
           classifier.addExample(activation, classId);

           // Dispose the tensor to release the memory.
          img.dispose();
     };

     // When clicking a button, add an example for that class.
    document.getElementById('class-a').addEventListener('click', () => addExample(0));
    document.getElementById('class-b').addEventListener('click', () => addExample(1));
    document.getElementById('class-c').addEventListener('click', () => addExample(2));

    ....

Main idea is to use existing network to make its prediction and then substitute the found label with your own one.

Complete code is in the tutorial. Another promising, more advanced one in [2]. It needs strict pre processing, so I leave it only here, I mean it is so much more advanced one.

Sources:

[1] https://codelabs.developers.google.com/codelabs/tensorflowjs-teachablemachine-codelab/index.html#6

[2] https://towardsdatascience.com/training-custom-image-classification-model-on-the-browser-with-tensorflow-js-and-angular-f1796ed24934

mico
  • 12,730
  • 12
  • 59
  • 99
  • Please, take a look to my second answer, it is far more close to reality, where to start with. – mico Dec 13 '19 at 08:46
  • 1
    Why not put both answers into one ? – edkeveked Dec 13 '19 at 11:44
  • They have so different approach to same thing. This above one, where I comment now is actually a workaround, the another one is starting from basics, which I think now later on is more appropriate towards the question setting. – mico Dec 13 '19 at 14:17
6

TL;DR

MNIST is the image recognition Hello World. After learning it by heart, these questions in your mind are easy to solve.


Question setting:

Your main question written is

 // how to train, where to pass image and labels ?

inside your code block. For those I found perfect answer from examples of Tensorflow.js examples section: MNIST example. My below links have pure javascript and node.js versions of it and Wikipedia explanation. I will go them through on the level necessary to answer the main question in your mind and I will add also perspectives how your own images and labels have anything to do with MNIST image set and the examples using it.

First things first:

Code snippets.

where to pass images (Node.js sample)

async function loadImages(filename) {
  const buffer = await fetchOnceAndSaveToDiskWithBuffer(filename);

  const headerBytes = IMAGE_HEADER_BYTES;
  const recordBytes = IMAGE_HEIGHT * IMAGE_WIDTH;

  const headerValues = loadHeaderValues(buffer, headerBytes);
  assert.equal(headerValues[0], IMAGE_HEADER_MAGIC_NUM);
  assert.equal(headerValues[2], IMAGE_HEIGHT);
  assert.equal(headerValues[3], IMAGE_WIDTH);

  const images = [];
  let index = headerBytes;
  while (index < buffer.byteLength) {
    const array = new Float32Array(recordBytes);
    for (let i = 0; i < recordBytes; i++) {
      // Normalize the pixel values into the 0-1 interval, from
      // the original 0-255 interval.
      array[i] = buffer.readUInt8(index++) / 255;
    }
    images.push(array);
  }

  assert.equal(images.length, headerValues[1]);
  return images;
}

Notes:

MNIST dataset is a huge image, where in one file there are several images like tiles in puzzle, each and every with same size, side by side, like boxes in x and y coordination table. Each box has one sample and corresponding x and y in the labels array has the label. From this example, it is not a big deal to turn it to several files format, so that actually only one pic at a time is given to the while loop to handle.

Labels:

async function loadLabels(filename) {
  const buffer = await fetchOnceAndSaveToDiskWithBuffer(filename);

  const headerBytes = LABEL_HEADER_BYTES;
  const recordBytes = LABEL_RECORD_BYTE;

  const headerValues = loadHeaderValues(buffer, headerBytes);
  assert.equal(headerValues[0], LABEL_HEADER_MAGIC_NUM);

  const labels = [];
  let index = headerBytes;
  while (index < buffer.byteLength) {
    const array = new Int32Array(recordBytes);
    for (let i = 0; i < recordBytes; i++) {
      array[i] = buffer.readUInt8(index++);
    }
    labels.push(array);
  }

  assert.equal(labels.length, headerValues[1]);
  return labels;
}

Notes:

Here, labels are also byte data in a file. In Javascript world, and with the approach you have in your starting point, labels could also be a json array.

train the model:

await data.loadData();

  const {images: trainImages, labels: trainLabels} = data.getTrainData();
  model.summary();

  let epochBeginTime;
  let millisPerStep;
  const validationSplit = 0.15;
  const numTrainExamplesPerEpoch =
      trainImages.shape[0] * (1 - validationSplit);
  const numTrainBatchesPerEpoch =
      Math.ceil(numTrainExamplesPerEpoch / batchSize);
  await model.fit(trainImages, trainLabels, {
    epochs,
    batchSize,
    validationSplit
  });

Notes:

Here model.fit is the actual line of code that does the thing: trains the model.

Results of the whole thing:

  const {images: testImages, labels: testLabels} = data.getTestData();
  const evalOutput = model.evaluate(testImages, testLabels);

  console.log(
      `\nEvaluation result:\n` +
      `  Loss = ${evalOutput[0].dataSync()[0].toFixed(3)}; `+
      `Accuracy = ${evalOutput[1].dataSync()[0].toFixed(3)}`);

Note:

In Data Science, also this time here, the most faschinating part is to know how well the model survives the test of new data and no labels, can it label them or not? For that is the evaluation part that now prints us some numbers.

Loss and accuracy: [4]

The lower the loss, the better a model (unless the model has over-fitted to the training data). The loss is calculated on training and validation and its interperation is how well the model is doing for these two sets. Unlike accuracy, loss is not a percentage. It is a summation of the errors made for each example in training or validation sets.

..

The accuracy of a model is usually determined after the model parameters are learned and fixed and no learning is taking place. Then the test samples are fed to the model and the number of mistakes (zero-one loss) the model makes are recorded, after comparison to the true targets.


More information:

In the github pages, in README.md file, there is a link to tutorial, where all in the github example is explained in greater detail.


[1] https://github.com/tensorflow/tfjs-examples/tree/master/mnist

[2] https://github.com/tensorflow/tfjs-examples/tree/master/mnist-node

[3] https://en.wikipedia.org/wiki/MNIST_database

[4] How to interpret "loss" and "accuracy" for a machine learning model

mico
  • 12,730
  • 12
  • 59
  • 99