1

I'm having trouble trying to achieve this in an accurate and automated way. I've tried the approaches discussed here, here and here, but none work in my scenario.

I have a spreadsheet with thousands of rows of data. Data is organised as follows:

  • Column A contains IP addresses in General format
  • Column B contains Date/Time in the following Custom format (d/mm/yyyy h:mm)
  • Column C contains duration in the following Custom format (h:mm:ss)

This data contains a number of duplicates I need to remove, based on the IP address in Column A. However, the criteria I need is to remove whichever duplicates are not the longest duration. To better explain my scenario, see sample image below:

enter image description here

I need a way to remove all duplicates of a particular IP address that do not contain the longest duration for that IP address. So, using the above example, row 3 would be deleted because the duration of 1 minute is shorter than 36 minutes in row 4 that contains the same IP address.

Another example is that rows 5, 6 and 7 would also be removed as all their durations are shorter than row 8 which has the same IP address but a longer duration. Of course, any rows already containing unique IP addresses would be left alone. The end result using my above sample would be as follows:

enter image description here

Of course, in my sample above all the data was nicely sorted by IP address first and Duration second. In real life this isn't the case, but that's something easy enough for me to do prior to any solution, if necessary.

The key thing is that in some cases an IP address may be duplicated once, in others it may be duplicated many times over. I just need to ensure that only the one with the longest duration remains. In the event that multiple instances of an IP address has the same longest duration, then I want them all kept. That is, if an IP address is repeated ten times and its longest duration is an hour for two of those times, then both of them need to remain.

I'm happy with any solution for this, be it using formulas, functions or macros.

Community
  • 1
  • 1
Monomeeth
  • 753
  • 3
  • 13
  • 29
  • 1
    So the date does not matter? – dgorti Nov 10 '16 at 09:05
  • Sort column C from largest to smallest and use the Excel function "remove duplicates" on column A? Edit: won't work for your final criteria (keeping duplicates with identical duration). – Vegard Nov 10 '16 at 09:07
  • If the date doesn't matter and you are simply looking for unique values in column `A` and then get the `max` for all of these unique values in column `C` then you should use this post to get all unique IP values: http://stackoverflow.com/questions/40514343/extracting-data-ignore-duplicates-compile-information#comment68275496_40514343 (since the IP values are text values). Then get the `max` in column `C` based on the text in column `A` and then get the `date` in column `B` based on a match of column `A` & column `C`. – Ralph Nov 10 '16 at 09:10

1 Answers1

1

You can solve your task using the helper column (column D).

  1. Insert the following array formula to the cell D2:

    =IF($C2=MAX(IF($A2=$A$2:$A$50,$C$2:$C$50,-1)),"Remain","Remove")

    where 50 - the last row of your table

    Remember to press Ctrl+Shift+Enter to complete the array formula correctly.

  2. Copy/paste the formula to the other cells.

  3. Аpply filter to column D by "remove" value

  4. Delete filtered rows.