I'm trying to remove duplicates based on one column and keep the last entry. Right now my formula is keeping the first value.
I'm using the formula found in this post: Selecting all rows with distinct column values - Google query language
I'm trying to remove duplicates based on one column and keep the last entry. Right now my formula is keeping the first value.
I'm using the formula found in this post: Selecting all rows with distinct column values - Google query language
Well the short answer is just to change 0 (or false) in your formula to 1 (or true) so that VLOOKUP matches the last entry for each unique value
=ArrayFormula(iferror(VLOOKUP(unique(Data!D:D),{Data!D:D,Data!A:D}, {2,3,4,5},1 ),""))
This does appear to work for your test data
but that isn't the end of the story.
If you use VLOOKUP with this formula the data has to be sorted on the lookup column according to the documentation but in the comments above you said that you can't assume the data is sorted on the lookup column. Things do go horribly wrong if you try this on unsorted data. So you have to sort it on the lookup column like this
=ArrayFormula(iferror(VLOOKUP(sort(unique(Data1!D2:D),1,true),sort({Data1!D2:D,Data1!A2:D},1,true), {2,3,4,5},1 )))
the only slight downside being that this doesn't include the headings (because they would get sorted to the end of the data).
Here is the same test data sorted in descending order on ID
This gives the correct result (but without headers)
You can add the headers just by putting
=query(Data1!A:D,"select * limit 0")
above the data.