3

I'm trying to train a Net for semantic segmentation with class imbalance. To account for this, i tried to implement the InfoGainLoss layer and specified the infogain_matrix as was posted here, where I instead used 1- frequency(class) for each diagonal element.
When training the net however, both accuracy and loss immediately converge to 0, even with low base_lr, and the net labels everything as class 0 ("unknown"). My question now is whether the infogain_matrix should be specified as in the post I linked and if so what other reasons could be for this unusual behaviour of the net ( I expected either loss 0 accuracy 1 or loss INF accuracy 0).

Edit:
So when i run the Net with the SoftMaxWithLoss layer instead of InfoGainLoss, it immediately starts to classify everything as the most representative class (class1 with 90%), and doesn't change anymore. My guess now is that i configured the lmdb for the infogain_matrix incorrectly. Does anybody know if one has to specify the dtype of the lmdb for the caffe data layer (images and infogain_matrix are stored as float32), though caffe documentation for the layer does not say so? Or moreover, what dtypes the caffe data layer expects from the lmdb?
lmdbs were generated using the code taken/modified from here, but for the images mean subtraction was performed priorily. I tested lmdb readout in python, and here i had to specify the dtype, as otherwise reshape into the original matrix dimensions throwed errors.

Edit2:
So the mistake indeed was in the lmdb definition, as for dtype=float, data needs to be appended to datum.float_data instead of datum.data, see here. Now everything looks okay and the accuracies and losses are no locker whacky :)

ORippler
  • 61
  • 6

1 Answers1

2

The mistake was in the lmdb definition, as for dtype=float, the data needs to be appended to datum.float_data instead of datum.data (which needs to be left empty so that caffe automatically scans datum.float_data); SOURCE

So the code from here for generating lmdb dataset with python could be modified as follows:

with env.begin(write=True) as txn:
    # txn is a Transaction object
    for i in range(N):
        datum = caffe.proto.caffe_pb2.Datum()
        datum.channels = X.shape[1]
        datum.height = X.shape[2]
        datum.width = X.shape[3]
        datum.float_data.extend(X[i].astype(float).flat)
        datum.label = int(y[i])
        str_id = '{:08}'.format(i)
        # The encode is only essential in Python 3
        txn.put(str_id.encode('ascii'), datum.SerializeToString())

The thing is, caffe doesn't throw errors if you wrongly append float data to datum.data instead of datum.float_data, but will result in whacky behaviour like accuracy and loss both going to 0 (as the infogain_mat H may be 0 for certain classes due to dtype mismatch)

ORippler
  • 61
  • 6