2

I'm trying to remove duplicates based on one column and keep the last entry. Right now my formula is keeping the first value.

I'm using the formula found in this post: Selecting all rows with distinct column values - Google query language

Community
  • 1
  • 1
dsalzman
  • 23
  • 4
  • Is your data sorted like in the answer that you're using? – Tom Sharpe Apr 08 '17 at 09:07
  • No, my data is sorted from columns A through Y and the column I am looking for duplicates in is column N. The formula I'm using is: =ArrayFormula(iferror(VLOOKUP(unique(Data!N:N),{Data!N:N,Data!A:Y}, {2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28}, 0 ))) – dsalzman Apr 10 '17 at 12:53
  • I think it should be do-able but could do really with mocking up some data with just a few rows and columns to test a possible answer. – Tom Sharpe Apr 10 '17 at 14:05
  • Created a test spreadsheet here: https://docs.google.com/spreadsheets/d/1evizdaHFX_tA4_zd4-l2snj4x0jul5LKwopiYHmLMhI/edit?usp=sharing – dsalzman Apr 10 '17 at 15:57

1 Answers1

2

Well the short answer is just to change 0 (or false) in your formula to 1 (or true) so that VLOOKUP matches the last entry for each unique value

=ArrayFormula(iferror(VLOOKUP(unique(Data!D:D),{Data!D:D,Data!A:D}, {2,3,4,5},1 ),""))

This does appear to work for your test data

enter image description here

but that isn't the end of the story.

If you use VLOOKUP with this formula the data has to be sorted on the lookup column according to the documentation but in the comments above you said that you can't assume the data is sorted on the lookup column. Things do go horribly wrong if you try this on unsorted data. So you have to sort it on the lookup column like this

=ArrayFormula(iferror(VLOOKUP(sort(unique(Data1!D2:D),1,true),sort({Data1!D2:D,Data1!A2:D},1,true), {2,3,4,5},1 )))

the only slight downside being that this doesn't include the headings (because they would get sorted to the end of the data).

Here is the same test data sorted in descending order on ID

enter image description here

This gives the correct result (but without headers)

enter image description here

You can add the headers just by putting

=query(Data1!A:D,"select * limit 0")

above the data.

Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37