2

I am using convolution neural network.

My data is quite imbalanced, I have two classes.

My first class contains: 551,462 image files

My second class contains: 52,377 image files

I want to use weighted_cross_entropy_with_logits, but I'm not sure I'm calculating pos_weight variable correctly.

Right now I'm using

classes_weights = tf.constant([0.0949784, 1.0])
cross_entropy = tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(logits=logits, targets=y_, pos_weight=classes_weights))
train_step = tf.train.AdamOptimizer(LEARNING_RATE, epsilon=1e-03).minimize(
      cross_entropy
    , global_step=global_step
    )

Or should I use

classes_weights = 10.5287

2 Answers2

5

From the documentation:

pos_weight: A coefficient to use on the positive examples.

and

The argument pos_weight is used as a multiplier for the positive targets:

So if your first class is positive, then pos_weights = 52,377 / 551,462, otherwise 551,462 / 52,377

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
  • I was thinking this way, but I saw couple examples where people has used array of class coefficients as an input [source](http://stackoverflow.com/a/42163122/1574139). Also running the code with `pos_weights = 10.5287` keeps loss at very high numbers. Even after 60600 iterations * 50 of mini batch has reached mean loss at some point over 1.0 which doesn't seem right. And it seems class 1 has way better accuracy already and class 2 is not improving that well. – Darius Šilkaitis Apr 23 '17 at 10:44
  • @DariusŠilkaitis this is what documentation says, and I have more trust in it then in one lonely answer on SO. You tried with my approach and are not satisfied with the results, but have you tried another approach `tf.constant([0.0949784, 1.0])`? – Salvador Dali Apr 23 '17 at 18:58
  • training is quite slow on such big data. So I haven't tried both solutions deeply yet. With `tf.constant([0.0949784, 1.0])` loss seemed way too low for my eye, but I was getting better accuracy. It will take me couple days to try both configurations for at least 20 epochs each one. I'll update results here. Thanks for help. – Darius Šilkaitis Apr 23 '17 at 19:08
  • @SalvadorDali trying this approach with a scalar value for pos_weight classifies all the majority class (0's) as the minortiy class (1's), substantially increasing False Positives. Any clue on why that may be? Thanks in advance. – mamafoku Aug 01 '17 at 15:32
1

As @Salvador Dali said, the best source is the source code https://github.com/tensorflow/tensorflow/blob/5b10b3474bea72e29875264bb34be476e187039c/tensorflow/python/ops/nn_impl.py#L183

We have

log_weight = 1 + (pos_weight - 1) * targets

so it only applies if targets==1.

If targets==0 then log_weight = 1

If targets==1 then log_weight = pos_weight

So if we have ratio of positives to negatives x/y we need pos_weight to be y/x so both categories will contribute equally in total

Please note that each scalar in targets tensor corresponds to each category so each member of pos_weight corresponds to each category as well (not positive or negative probability for one category) .

Sergei
  • 31
  • 1
  • 6