TFX IndexError on Evaluator component

Question

I'm trying to make an Evaluator for my model. Until now every other components are fine but When I try this config:

eval_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(label_key='Category'),
    ],
    metrics_specs=tfma.metrics.default_multi_class_classification_specs(),
    slicing_specs=[
        tfma.SlicingSpec(),
        tfma.SlicingSpec(feature_keys=['Category'])
    ])

to make this evaluator:

model_resolver = ResolverNode(
      instance_name='latest_blessed_model_resolver',
      resolver_class=latest_blessed_model_resolver.LatestBlessedModelResolver,
      model=Channel(type=Model),
      model_blessing=Channel(type=ModelBlessing))
context.run(model_resolver)

evaluator = Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    baseline_model=model_resolver.outputs['model'],
    eval_config=eval_config)
context.run(evaluator)

I get this:

[...]
IndexError                                Traceback (most recent call last)
/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/common.cpython-37m-darwin.so in apache_beam.runners.common.DoFnRunner.process()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/common.cpython-37m-darwin.so in apache_beam.runners.common.PerWindowInvoker.invoke_process()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/common.cpython-37m-darwin.so in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/common.cpython-37m-darwin.so in apache_beam.runners.common._OutputProcessor.process_outputs()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/worker/operations.cpython-37m-darwin.so in apache_beam.runners.worker.operations.SingletonConsumerSet.receive()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/worker/operations.cpython-37m-darwin.so in apache_beam.runners.worker.operations.PGBKCVOperation.process()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/apache_beam/runners/worker/operations.cpython-37m-darwin.so in apache_beam.runners.worker.operations.PGBKCVOperation.process()

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/tensorflow_model_analysis/evaluators/metrics_and_plots_evaluator_v2.py in add_input(self, accumulator, element)
    355     for i, (c, a) in enumerate(zip(self._combiners, accumulator)):
--> 356       result = c.add_input(a, get_combiner_input(elements[0], i))
    357       for e in elements[1:]:

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/tensorflow_model_analysis/metrics/calibration_histogram.py in add_input(self, accumulator, element)
    141             flatten=True,
--> 142             class_weights=self._class_weights)):
    143       example_weight = float(example_weight)

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/tensorflow_model_analysis/metrics/metric_util.py in to_label_prediction_example_weight(inputs, eval_config, model_name, output_name, sub_key, class_weights, flatten, squeeze, allow_none)
    283     elif sub_key.top_k is not None:
--> 284       label, prediction = select_top_k(sub_key.top_k, label, prediction)
    285 

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/tensorflow_model_analysis/metrics/metric_util.py in select_top_k(top_k, labels, predictions, scores)
    621   if not labels.shape or labels.shape[-1] == 1:
--> 622     labels = one_hot(labels, predictions)
    623 

/opt/miniconda3/envs/archiving/lib/python3.7/site-packages/tensorflow_model_analysis/metrics/metric_util.py in one_hot(tensor, target)
    671   # indexing the -1 and then removing it after.
--> 672   tensor = np.delete(np.eye(target.shape[-1] + 1)[tensor], -1, axis=-1)
    673   return tensor.reshape(target.shape)

IndexError: arrays used as indices must be of integer (or boolean) type

During handling of the above exception, another exception occurred:
[...]

IndexError: arrays used as indices must be of integer (or boolean) type [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/ComputePerSlice/ComputeUnsampledMetrics/CombinePerSliceKey/WindowIntoDiscarding']

I thought it was my config, but I don't get what is wrong with this.

I'm using this data set Kaggle - BBC News Classification. I've followed this notebook: TFX - Chicago Taxi in order to serve my model with Tensorflow Serving.

Note: The model I'm using look like this:

def _build_keras_model(vectorize_layer: TextVectorization) -> tf.keras.Model: 

  input_layer = tf.keras.layers.Input(shape=(1,), dtype=tf.string)

  deep = vectorize_layer(input_layer)
  deep = layers.Embedding(_max_features + 1, _embedding_dim)(deep)
  deep = layers.Dropout(0.5)(deep)
  deep = layers.GlobalAveragePooling1D()(deep)
  deep = layers.Dropout(0.5)(deep)

  output = layers.Dense(5, activation=tf.nn.softmax)(deep)

  model = tf.keras.Model(input_layer, output)
  model.compile(
      loss=losses.SparseCategoricalCrossentropy(from_logits=True),
      optimizer='adam', 
      metrics=['accuracy'])
  model.summary(print_fn=absl.logging.info)  
  return model

score 2 · Accepted Answer · answered Aug 17 '20 at 12:35

I got it to work. My problem was that in the data set the label (the document category) is in a string format (e.g: "sport", "business",...). So to encode it as an integer I used the Transform component to preprocess it.

However, when building the evaluator component I passed the ExampleGen component where no processing were done on the data. So the evaluator was trying to cast the string from the ExampleGen to match the integer output from the model.

So, to fix this I simply did this:

model_resolver = ResolverNode(
      instance_name='latest_blessed_model_resolver',
      resolver_class=latest_blessed_model_resolver.LatestBlessedModelResolver,
      model=Channel(type=Model),
      model_blessing=Channel(type=ModelBlessing))
context.run(model_resolver)

evaluator = Evaluator(
    examples=transform.outputs['transformed_examples'],
    model=trainer.outputs['model'],
    baseline_model=model_resolver.outputs['model'],
    eval_config=eval_config)
context.run(evaluator)

I used the examples from the transform component. Of course I also changed the label key in the config to match the label name of the transform component.

I don't know if there is a 'cleaner' way to perform this (or if I'm doing this all wrong please correct me!)

TFX IndexError on Evaluator component

1 Answers1