IOU of two bounding boxes located in two different images

Question

I am trying to calculate the itersection over union of predicted bounding boxes with their corresponding ground truth boxes. The problem is that the model crops the image in order to locate the object. ( I cannot change that ). So now, I have images that have different sizes than the original ones therefore the coordinates of the predicted bounding boxes are with respect to the new size of the image. What is the best way in such situation to calculate the intersection over union with the ground truth ? I tried rescaling the predicted image to the original size (and rescaling the predicted coordinates) but some of them are too small to the fact that the bounding box becomes a line. ( some bounding boxes will have the same y value for ymin and ymax). So what should I do or how sould I proceed ?

I edited my question for more clarification: There is no specified fixed size for the original images nor for the cropped ones. Each original image has tables in it and the model will crop these tables. The ground truth is the coordinates of the cells of the tables in the original image and the predicted boxes are the coordinates of the cells of each table (in the cropped image). I opted for linear interpolation to calculate the predicted coordinates of the cells like they were in the original image but beacuse the original image is small ( for example 594 x 845 ) the calculated coordinates will become very small. For example : a predicted box is [696,0,1414,48] after using linear interpolation it will become [414,0,888,0] so it is now a line not a rectangle. The image given by the model in this case has the size 1000 x 1048

Please give an example of a problematic sample, ideally with a picture of original image and modified one, and/or the predicted+ground truth boxes. Also, all matching information that you have between original and modified image can be useful — Tawy, Aug 11 '21 at 14:44
There is no specified fixed size for the original images nor for the cropped ones. Each original image has tables in it and the model will crop these tables. The ground truth is the coordinates of the cells of the tables in the original image and the predicted boxes are the coordinates of the cells of each table (in the cropped image). I opted for linear interpolation to calculate the predicted coordinates of the cells like they were in the original image but beacuse the original image is small ( for example 598 x 848 ) the calculated coordinates will become very small. ] — Youssef Maghrebi, Aug 11 '21 at 16:55
for example : a predicted box is [696,0,1414,48] after using linear interpolation it will become [414,0,888,0] so it is now a line not a rectangle — Youssef Maghrebi, Aug 11 '21 at 16:55
What is your formula for linear interpolation ? It should not become a line in your example. — Tawy, Aug 11 '21 at 17:21
Could you edit the question to give all information related to your faulty sample? For example, size of the original image (598x848), size of the cropped image, location of the cropped image relatively to the original image, ground truth bbox relatively to the original image, and predicted bbox relatively to the cropped image ([696,0,1414,48] ). — Tawy, Aug 11 '21 at 17:41
I edited my question and i gave all the demanded info in your comment — Youssef Maghrebi, Aug 11 '21 at 18:23
The most important information is still missing: Which part of the original image has the cropped image been extracted from ? E.g. it could be the region [100,100,300,310] magnified 5 times. Do you have this information ? — Tawy, Aug 11 '21 at 18:43
I don't have this info for this example. I will try to look for an other that has that info. But if in case what are you saying is correct , what conclusion could we make ? — Youssef Maghrebi, Aug 11 '21 at 18:57
Ouch! You definitely need this information to compute the IoU. There are possibilities to retrieve it, e.g. using https://stackoverflow.com/questions/7670112/finding-a-subimage-inside-a-numpy-image but the fact that it has been resized adds a layer of complexity and you will probably need to use FFT. I will probably not have the time to make a satisfying answer anytime soon. If you can share the two images from your example, it might still help for testing though. — Tawy, Aug 11 '21 at 19:16
The problem is that the model when it detects the table in the original image , it does the cropping of that table but the size of the final cropped image is bigger than the original size. In this case open cv's matchtemplate won't work because the subimage must be smaller than the original — Youssef Maghrebi, Aug 11 '21 at 19:27

IOU of two bounding boxes located in two different images

0 Answers0