In Excel, remove duplicates from one column based on the values in another column, either through VBA or a combination of formulas/functions

Question

I'm having trouble trying to achieve this in an accurate and automated way. I've tried the approaches discussed here, here and here, but none work in my scenario.

I have a spreadsheet with thousands of rows of data. Data is organised as follows:

Column A contains IP addresses in General format
Column B contains Date/Time in the following Custom format (d/mm/yyyy h:mm)
Column C contains duration in the following Custom format (h:mm:ss)

This data contains a number of duplicates I need to remove, based on the IP address in Column A. However, the criteria I need is to remove whichever duplicates are not the longest duration. To better explain my scenario, see sample image below:

I need a way to remove all duplicates of a particular IP address that do not contain the longest duration for that IP address. So, using the above example, row 3 would be deleted because the duration of 1 minute is shorter than 36 minutes in row 4 that contains the same IP address.

Another example is that rows 5, 6 and 7 would also be removed as all their durations are shorter than row 8 which has the same IP address but a longer duration. Of course, any rows already containing unique IP addresses would be left alone. The end result using my above sample would be as follows:

Of course, in my sample above all the data was nicely sorted by IP address first and Duration second. In real life this isn't the case, but that's something easy enough for me to do prior to any solution, if necessary.

The key thing is that in some cases an IP address may be duplicated once, in others it may be duplicated many times over. I just need to ensure that only the one with the longest duration remains. In the event that multiple instances of an IP address has the same longest duration, then I want them all kept. That is, if an IP address is repeated ten times and its longest duration is an hour for two of those times, then both of them need to remain.

I'm happy with any solution for this, be it using formulas, functions or macros.

Sort column C from largest to smallest and use the Excel function "remove duplicates" on column A? Edit: won't work for your final criteria (keeping duplicates with identical duration). — Vegard, Nov 10 '16 at 09:07
If the date doesn't matter and you are simply looking for unique values in column `A` and then get the `max` for all of these unique values in column `C` then you should use this post to get all unique IP values: http://stackoverflow.com/questions/40514343/extracting-data-ignore-duplicates-compile-information#comment68275496_40514343 (since the IP values are text values). Then get the `max` in column `C` based on the text in column `A` and then get the `date` in column `B` based on a match of column `A` & column `C`. — Ralph, Nov 10 '16 at 09:10

score 1 · Accepted Answer · answered Nov 10 '16 at 20:38

1

You can solve your task using the helper column (column D).

Insert the following array formula to the cell D2:

=IF($C2=MAX(IF($A2=$A$2:$A$50,$C$2:$C$50,-1)),"Remain","Remove")

where 50 - the last row of your table

Remember to press Ctrl+Shift+Enter to complete the array formula correctly.
Copy/paste the formula to the other cells.
Аpply filter to column D by "remove" value
Delete filtered rows.

answered Nov 10 '16 at 20:38

Alexander Frolov

306
2
9

Thank you! This saved me an awful LOT of time. :) – Monomeeth Nov 14 '16 at 02:23

In Excel, remove duplicates from one column based on the values in another column, either through VBA or a combination of formulas/functions

1 Answers1