2

I have a set of rectangles, usually parallel with each other (but it is not guaranteed). These rectangles usually do not overlap. I have attached an example below. It is actually a byproduct of OCR output, where the detected text has been replaced by bounding boxes.

Example

I would like to consolidate these rectangles into bigger rectangles. The resulting rectangle could not overlap. The grouping could look as follows:

Consolidation

What is the best way to do it? I could not find previous answers on Stackoverflow:

  • if you "grouped" the top right rectangle as its own group, would you have considered it a better grouping than your example? or worst? what is considered "optimal"? – Tomer Shahar Mar 13 '19 at 14:33
  • The rectangles are actually placeholders for text recognized by OCR. The initial image is usually a scan of a multiple of receipts. As such: 1) Grouping rectangles will usually be similar in size 2) Input rectangles close from each other should be part of the same grouping rectangle 3) The grouping rectangle will have nested rectangle with the same orientation. – Pierre Rebours Mar 13 '19 at 14:47
  • What if you group them as one single large rectangle? doesn't that meet the criteria? – Tomer Shahar Mar 14 '19 at 08:42
  • I see. Let me rephrase the optimization criteria: 1) find grouped rectangle that minimize the area of these grouped rectangle not occupied by the underlying rectangles (ie. Avoid white space) 2) minimize the number of grouped rectangles. – Pierre Rebours Mar 14 '19 at 12:38
  • Are you familiar with evolution based algorithms? They are probability based so there is no guaranty to get an answer every time (or the same answer twice) but I believe you will need it. I can outline a generic solution but you will need to to some research and even then it may not work or not work as well as you need it to. – Tomer Shahar Mar 14 '19 at 13:03

0 Answers0