-2

I have a dataset which has customer purchase information. I have tried to create unique id by concatenating device_id(of the customer), store_id, product_id and date (of purchase). I have used the following code for this

customer$device_store_product_date <- paste(customer$device, customer$store_id, customer$product_id, customer$date, sep='_')

The resultant column is something like this:

        device_store_product_date
48c6eec37affa1db_203723_9313962_2016-02-19
eb2c2f00071b97f3_179926_6180944_2016-02-20
d82066a784c9552_180704_9308311_2016-02-20
9766bba65b1ef9ac_204187_9313852_2016-02-20
77d80c1066f5267_180488_9312672_2016-02-20

As expected there are still duplicates. To identify them i used duplicated():

x1 = customer[duplicated(customer$device_store_product_date),]

However, for few of the x1$device_store_product_date only single entries are present. This should not be the case as x1 should consist of repeated values. Let me know where am i going wrong. To select entries corresponding to a particular value of device_store_product_date i have used:

filter(x1, x1$device_store_product_date=="14163e6b6ed06890_203723_9313477_2016-02-20")
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Arshad Islam
  • 107
  • 1
  • 1
  • 10

2 Answers2

0

duplicated() returns TRUE for any value that has already occurred, so

x <-c("a","b","a")
duplicated(x)

will return

FALSE FALSE TRUE

If you want to get all the first occurrence as well, something like this will work

duplicated(x)|rev(duplicated(rev(x)))
David Maust
  • 8,080
  • 3
  • 32
  • 36
Richard Telford
  • 9,558
  • 6
  • 38
  • 51
  • That solved my problem...thanks. If possible could you please explain duplicated(x)|rev(duplicated(rev(x))). I am have just started learning R. – Arshad Islam Mar 19 '16 at 21:12
  • The vertical bar means OR and rev() will reverse the order of the vector, so start looking for duplicates from the other end. The solution by akrun is more elegant. – Richard Telford Mar 19 '16 at 21:59
0

The duplicated function has an argument fromLast=TRUE to check for duplicates from the end. Here, the last element will be FALSE and all other duplicates return TRUE. By using |, we ensure that all the duplicate elements are included.

 duplicated(x)|duplicated(x, fromLast=TRUE)

can be used to get all the duplicate elements

akrun
  • 874,273
  • 37
  • 540
  • 662