how to apply unique condition for a data frame like here

Question

Here is reproducible data

df <- structure(list(df_A = structure(c(1L, 2L, 2L, 3L, 3L, 4L, 5L, 
    5L, 5L, 6L, 7L, 7L, 8L), .Label = c("2016-05-10", "2016-05-11", 
    "2016-05-12", "2016-05-14", "2016-05-15", "2016-05-17", "2016-05-18", 
    "2016-05-19"), class = "factor"), df_B = structure(1:13, .Label = c("SO-101", 
    "SO-102", "SO-103", "SO-104", "SO-105", "SO-106", "SO-107", "SO-108", 
    "SO-109", "SO-110", "SO-111", "SO-112", "SO-113"), class = "factor"), 
        df_C = c(1113L, 2312L, 2312L, 2314L, 2314L, 2546L, 2315L, 
        2315L, 2315L, 5689L, 2546L, 2546L, 2312L)), .Names = c("df_A", 
    "df_B", "df_C"), class = "data.frame", row.names = c(NA, -13L
    ))
> df1
         df_A   df_B df_C
1  2016-05-10 SO-101 1113
2  2016-05-11 SO-102 2312
3  2016-05-11 SO-103 2312
4  2016-05-12 SO-104 2314
5  2016-05-12 SO-105 2314
6  2016-05-14 SO-106 2546
7  2016-05-15 SO-107 2315
8  2016-05-15 SO-108 2315
9  2016-05-15 SO-109 2315
10 2016-05-17 SO-110 5689
11 2016-05-18 SO-111 2546
12 2016-05-18 SO-112 2546
13 2016-05-19 SO-113 2312

let say
df_B column represents the production order.
df_C column represents the product certificate

For every production order there can be a same certificate as the previous order or can be a different one. I want to know date of product certificate changed. So i want to subset the rows in df_c column by using unique, but in that case i will loose the row with different production order, but same certificate which produced later.

df2 is my desired dataframe.

df2
        df_A   df_B df_C
1 2016-05-10 SO-101 1113
2 2016-05-11 SO-102 2312
3 2016-05-12 SO-104 2314
4 2016-05-14 SO-106 2546
5 2016-05-15 SO-107 2315
6 2016-05-17 SO-110 5689
7 2016-05-18 SO-111 2546
8 2016-05-19 SO-113 2312

Gregor Thomas · Answer 1 · 2016-05-26T16:50:06.370

Keeping unique values is equivalent to removing duplicated values. duplicated returns indexes of duplicate rows, so we can filter out rows that are duplicated in columns 1 and 3:

df[!duplicated(df[c(1, 3)]), ]
         df_A   df_B df_C
1  2016-05-10 SO-101 1113
2  2016-05-11 SO-102 2312
4  2016-05-12 SO-104 2314
6  2016-05-14 SO-106 2546
7  2016-05-15 SO-107 2315
10 2016-05-17 SO-110 5689
11 2016-05-18 SO-111 2546
13 2016-05-19 SO-113 2312

Also worth pointing out that the default duplicated argument is fromLast = FALSE, so this will keep the first instance of each df_A and df_C column. As long as your data is sorted by date to start with, this will give you the date the certificate changed as you request.

how to apply unique condition for a data frame like here

1 Answers1