+------------+-----+--------+-----+-------------+
| Meth.name | Min| Max |Layer| Global name |
+------------+-----+--------+-----+-------------+
| DTS | 2600| 3041.2 | AC1 | DTS |
| GGK | 1800| 3200.0 | AC1 | DEN |
| DTP | 700 | 3041.0 | AC2 | DT |
| DS | 700 | 3041.0 | AC3 | CALI |
| PF1 | 2800| 3012.0 | AC3 | CALI |
| PF2 | 3000| 3041.0 | AC4 | CALI |
+------------+-----+--------+-----+-------------+
We have to drop duplicated rows by "Global name" column but in specific way : we wants to choose the row, which will give the biggest intersection with range calculated using max value of column "Min"
and min value if column "Max"
of non-duplicated rows.
In example above this range will be [2600.0; 3041.0], so we wants to leave only row with ['Meth.name] == 'DS'
and overall result should be like:
+------------+-----+--------+-----+-------------+
| Meth.name | Min| Max |Layer| Global name |
+------------+-----+--------+-----+-------------+
| DTS | 2600| 3041.2 | AC1 | DTS |
| GGK | 1800| 3200.0 | AC1 | DEN |
| DTP | 700 | 3041.0 | AC2 | DT |
| DS | 700 | 3041.0 | AC3 | CALI |
+------------+-----+--------+-----+-------------+
This problem, of course, can be solved in several iterations (calculate interval based on non-duplicated rows and then iteratively select only those rows (from duplicated) that will give biggest intersection), but I'm trying to discover the most efficient approach Thank you