-2

I have set of data similar to this and wanted to get the earliest date of the SORT_DT when values of all the different columns match. Please help me solve this issue

df <- fread("CUST_NO ID_NO SYMBOL  AUTO_CREATE_DT     CLASS_TYPE    SORT_DT
         107   10120      1    2014-05-12             G/L  2015-01-09
        107   10120      1    2014-05-12             G/L  2015-11-10
        107   10120      1    2014-05-12             G/L  2014-06-18
        107   10120      1    2014-05-12             G/L  2014-05-13
        107   10120      1    2014-05-12             G/L  2015-07-10
        107   10120      1    2014-05-12             G/L  2015-10-09
        107   10120      1    2014-05-12             G/L  2016-04-08
        107   10120      1    2014-05-12             G/L  2016-01-08
        107   10120      1    2014-05-12             G/L  2016-12-22
        107   10120      1    2014-05-12             G/L  2017-01-13
        107   10120      1    2014-05-12             G/L  2016-07-08
        108   10120      1    2014-05-12             G/L  2017-04-14
        108   10120      1    2014-05-12             G/L  2017-04-17
        108   10120      1    2014-05-12             G/L  2016-08-31
        108   10120      1    2014-05-12             G/L  2015-04-10
        108   10120      1    2014-05-12             G/L  2016-12-22")

The output should be as below

  CUST_NO   ID_NO      SYMBOL  AUTO_CREATE_DT     CLASS_TYPE    SORT_DT
1         107     10120      1    2014-05-12             G/L     2014-05-13
2         108     10120      1    2014-05-12             G/L     2015-04-10 
Spacedman
  • 92,590
  • 12
  • 140
  • 224
Alex_P
  • 133
  • 2
  • 9
  • Consider reviewing the following SO answer: https://stackoverflow.com/questions/33436647/group-by-and-select-min-date-with-data-table. Your syntax would be as follows: `setDT(df)[, .SD[which.min(as.Date(SORT_DT))], by = CUST_NO]` – ccapizzano Sep 08 '17 at 01:36
  • @ccapizzano Thank you for your response. As my data set is large and did put only part of it due to security concerns can you please let me know how can I solve this question if all the rows of CUST_NO ID_NO SYMBOL AUTO_CREATE_DT CLASS_TYPE matches then selecting earliest date from SORT_DT – Alex_P Sep 08 '17 at 01:52

2 Answers2

0

Try this:

    df2 <- aggregate(df, list(df$CUST_NO, df$ID_NO, df$SYMBOL, df$AUTO_CREATE_DT, df$CLASS_TYPE), FUN = min)
    df2 <- df2[c("CUST_NO", "ID_NO", "SYMBOL", "AUTO_CREATE_DT", "CLASS_TYPE", "SORT_DT")]
dmb
  • 567
  • 5
  • 17
0
aggregate(SORT_DT ~ ., data = df, min)

#   CUST_NO ID_NO SYMBOL AUTO_CREATE_DT CLASS_TYPE    SORT_DT
# 1     107 10120      1     2014-05-12        G/L 2014-05-13
# 2     108 10120      1     2014-05-12        G/L 2015-04-10
acylam
  • 18,231
  • 5
  • 36
  • 45