1

I had thought I could solve this with data.table but looks like it is a bit more difficult. My dataframe looks like this:

userID1   A
userID1   A
userID1   B
userID2   A
userID2   A

The output is supposed to look like this:

userID1   A
userID1   B
userID2   A

Basically create a row for each unique user and item occurence. Most of the examples are about counting unique elements but not about actually extracting these. Any suggestions?

Tom
  • 319
  • 1
  • 3
  • 11

1 Answers1

3

This is a dplyr solution that should get those results

library(dplyr)

df <- data.frame(user = c(rep("userID1",3), rep("userID2",2)), 
                 group = c("A","A","B","A","A"), 
                 stringsAsFactors = FALSE)


df <- df %>% 
      group_by(user, group) %>% 
      filter(row_number() == 1)
Matt Jewett
  • 3,249
  • 1
  • 14
  • 21
  • In this instance, it may be more convenient to use: `df %>% distinct()` – George Wood Jul 04 '17 at 16:52
  • 2
    That would work too, assuming that there are truly only two columns in the original dataset, If there are really only the two columns in the dataset, then using `df[!duplicated(df),]` from base R would work as well. – Matt Jewett Jul 04 '17 at 16:56
  • 1
    Indeed. If there are more than two columns: `df %>% distinct(user, group, .keep_all = TRUE)` – George Wood Jul 04 '17 at 16:58