I have a flat file, it looks like this
┌───────┬──────────┬──────────┐ │ ITEM │ QTY_ON_HAND LOCATION│ ├───────┼──────────┼──────────┤ │ BOX │ 1 │ A │ │ BOX │ 0 │ B │ │ THING │ 1 │ C │ │ OTHER │ 2 │ A │ └───────┴──────────┴──────────┘
ITEM can have duplicates, I need to keep the one where QTY_ONHAND is highest (first BOX in example above), if both have the same value, then I would keep any occurrence.
Approaches:
Using Sort + Remove rows with duplicate sort values. Sort will delete random occurrences, I can't guarantee that the one with highest value will remain
Script Component. I can detect duplicates in a Script Component similar to this answer but I'd have to store all the rows in memory, compare each row against all the stored set of rows, keep the highest, remove the duplicates and somehow return only the valid rows, sounds very inefficient, however, there are not too many rows so I'll consider it (I'm still not sure how to achieve this).
Aggregate Transformation. I understand I can group the rows by ITEM and use the MAX operation to keep the one with highest value, this is the one I'm trying to do right now, but I'm stuck.
Sources:
SSIS: Flat File Source to SQL without Duplicate Rows
Remove duplicate in SSIS package with preference over a column data