I'm having trouble trying to achieve this in an accurate and automated way. I've tried the approaches discussed here, here and here, but none work in my scenario.
I have a spreadsheet with thousands of rows of data. Data is organised as follows:
- Column A contains IP addresses in General format
- Column B contains Date/Time in the following Custom format (d/mm/yyyy h:mm)
- Column C contains duration in the following Custom format (h:mm:ss)
This data contains a number of duplicates I need to remove, based on the IP address in Column A. However, the criteria I need is to remove whichever duplicates are not the longest duration. To better explain my scenario, see sample image below:
I need a way to remove all duplicates of a particular IP address that do not contain the longest duration for that IP address. So, using the above example, row 3 would be deleted because the duration of 1 minute is shorter than 36 minutes in row 4 that contains the same IP address.
Another example is that rows 5, 6 and 7 would also be removed as all their durations are shorter than row 8 which has the same IP address but a longer duration. Of course, any rows already containing unique IP addresses would be left alone. The end result using my above sample would be as follows:
Of course, in my sample above all the data was nicely sorted by IP address first and Duration second. In real life this isn't the case, but that's something easy enough for me to do prior to any solution, if necessary.
The key thing is that in some cases an IP address may be duplicated once, in others it may be duplicated many times over. I just need to ensure that only the one with the longest duration remains. In the event that multiple instances of an IP address has the same longest duration, then I want them all kept. That is, if an IP address is repeated ten times and its longest duration is an hour for two of those times, then both of them need to remain.
I'm happy with any solution for this, be it using formulas, functions or macros.