Detectron2 - Same Code&Data // Different platforms // highly divergent results

Question

I use different hardware to benchmark multiple possibilites. The Code runs in a jupyter Notebook.

When i evaluate the different losses i get highly divergent results.

I also checked the full .cfg with cfg.dump() - it is completely consistent.

Detectron2 Parameters:

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/retinanet_R_101_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("dataset_train",)
cfg.DATASETS.TEST = ("dataset_test",)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/retinanet_R_101_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025  # 0.00125 pick a good LR
cfg.SOLVER.MAX_ITER = 1200    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512   # faster, and good enough for this toy dataset (default: 512)
#cfg.MODEL.ROI_HEADS.NUM_CLASSES = 25  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
cfg.MODEL.RETINANET.NUM_CLASSES = 3
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
cfg.OUTPUT_DIR = "/content/drive/MyDrive/Colab_Notebooks/testrun/output"
cfg.TEST.EVAL_PERIOD = 25
cfg.SEED=5

1. Environment: Azure

Microsoft Azure - Machine Learning
STANDARD_NC6
Torch: 1.9.0+cu111

Results:

Training Log: Log Azure

2. Environment: Colab

GoogleColab free

Torch: 1.9.0+cu111

Results:

Training Log: Log Colab

EDIT:

3. Environment: Ubuntu

Ubuntu 22.04
RTX 3080
Torch: 1.9.0+cu111

Results:

Training Log: https://pastebin.com/PwXMz4hY

New dataset

Issue is not reproducible with a larger dataset:

do you have a third system to test on? perhaps something cpu-only? that'd let you guess which of the two is wrong/broken -- can you run any of the libraries' tests that test numerics for correctness? — Christoph Rackwitz, Jun 30 '22 at 13:20
Good point - this was also my thought. Will test it tomorrow on a third system ( ubuntu & local gpu ) to have another comparison. Will edit the mainpost after running the third test. — Natrium2, Jun 30 '22 at 13:21
fascinating. so your ubuntu environment agrees with azure, so either they're both "broken" or google colab does some kind of magic to improve training... or some libraries differ that you haven't accounted for. — Christoph Rackwitz, Jul 04 '22 at 15:10
Thats the question at this point.. I will do the test again with a different dataset ( maybe the amount of images had some strange impact ) and will edit results afterwards in this thread. — Natrium2, Jul 05 '22 at 13:18
now that you mention it, that looks like what's going on. I would have assumed that you work with identical data. validation loss goes up again, evidence of overfitting. I think you are giving it more data on the colab instance than on the ubuntu or azure instances. — Christoph Rackwitz, Jul 05 '22 at 13:22
Your assumption is right - i used the same dataset for all environments 20 Images Train - 10 Images Test and no random splitting. Maybe the amount for this testjob was just to low - but think that doesnt explain the different results. I will start a new test tomorrow with 400/200 pre-labeled images and check the results. — Natrium2, Jul 05 '22 at 14:20
Seems like the dataset was responsible for this issue - i cant reproduce the issue with a larger dataset, so everything seems to be fine. Find the comparison image in the thread. — Natrium2, Jul 07 '22 at 06:45

Detectron2 - Same Code&Data // Different platforms // highly divergent results

0 Answers0