-1

I want to know if it is possible to isolate duplicate records based from a date value but within groups. So essentially, I want extract records where for a given Title, ID, and category where the date values are the same?

Example:

Title   Title_ID   Category     Date
Title1    2728     Category 1   2013-08-09
Title1    2728     Category 2   2013-10-18
Title1    2728     Category 3   2013-11-05
Title1    2728     Category 4   2013-11-05

Desired Output:

Title   Title_ID   Category     Date
Title1    2728     Category 3   2013-11-05
Title1    2728     Category 4   2013-11-05

Is there a way to accomplish this within R's nifty packages?

Thanks.

mikeymike
  • 75
  • 5
  • Your output doesn't make sense with the input data, did you typo? Look at the `duplicated` function and google – astrofunkswag Dec 05 '18 at 19:40
  • The output is to extract those last 2 rows from the main data frame that will give me Cat 3 & 4 where the dates are the same while controlling for the Title and ID variable. I have a list of 800 titles like this and need to isolate every instance of similar dates with respect to the other qualitative fields. And have tried using the `duplicated` function but the output gives me only one instance. I need both within the group for further data manipulation, Does that make sense? – mikeymike Dec 05 '18 at 19:48
  • Yes, but your main description is confusing. There are lots of resources related to your issue: https://stackoverflow.com/questions/7854433/finding-all-duplicate-rows-including-elements-with-smaller-subscripts https://stackoverflow.com/questions/6986657/find-duplicated-rows-based-on-2-columns-in-data-frame-in-r – astrofunkswag Dec 05 '18 at 19:59
  • Again, I do not want ANY duplicate values. I am only looking for duplicate dates for a given Title and ID. I have research ALL through stackoverflow for similar cases which I have already seen these related issues you mentioned hence why the `duplicated` function in the second question, did not accomplish my task. Again, with a list of 800 titles, I am only trying to find duplicate date values with respect to the Title & ID. – mikeymike Dec 05 '18 at 20:06

1 Answers1

1

The two links I sent you in the comments are used together for this solution.

The first link shows you how to get all duplicate indices, not just the first one, with the fromLast argument combined with a | operator. The second shows you how to check for duplication across multiple columns. So you check for rows that have all the same Title, Title_ID, and Date values.

The last line of code removes exact duplicates if there are any in your dataframe. Your example doesn't include any and I'm not totally clear from your description

ind <- duplicated(dt[,c('Title', 'Title_ID', 'Date')]) | duplicated(dt[,c('Title', 'Title_ID', 'Date')], fromLast = T)

dt2 <- dt[ind,]

dt2[!duplicated(dt2),]
astrofunkswag
  • 2,608
  • 12
  • 25
  • A combo of the solutions you gave did the trick. To be honest, did not think to use both questions yesterday when I found them in stackoverflow. But thank you for help. Much obliged sir! – mikeymike Dec 05 '18 at 21:22