0

I have a datable in which I have several IDs that are all assigned to a group. But some IDs appear in multiple groups:

library(data.table)
df = data.table(ID = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D'),
                Group = c(1, 1, 2, 2, 2, 2, 2, 3, 3, 1, 1))

 #     ID       Group
 #  1:  A       1
 #  2:  A       1
 #  3:  A       2
 #  4:  B       2
 #  5:  B       2
 #  6:  B       2
 #  7:  C       2
 #  8:  C       3
 #  9:  C       3
 #  10:  D      1
 #  11:  D      1

Now I would like to move IDs that appear in multiple groups to the group in which it has the highest number of observations. So, multiple unique IDs can be present in one group, just not unique IDs in multiple groups. Like this:

 #     ID       Group
 #  1:  A       1
 #  2:  A       1
 #  3:  A       1
 #  4:  B       2
 #  5:  B       2
 #  6:  B       2
 #  7:  C       3
 #  8:  C       3
 #  9:  C       3
 #  10:  D      1
 #  11:  D      1
BWolk
  • 193
  • 1
  • 9

1 Answers1

2

You can get the Mode value for each ID.

library(data.table)

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

df[, Group := Mode(Group), ID]
df

#    ID Group
# 1:  A     1
# 2:  A     1
# 3:  A     1
# 4:  B     2
# 5:  B     2
# 6:  B     2
# 7:  C     3
# 8:  C     3
# 9:  C     3
#10:  D     1
#11:  D     1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213