Suppose I have a large data.table
that looks like dt
below.
dt <- data.table(
player_1 = c("a", "b", "b", "c"),
player_1_age = c(10, 20, 20, 30),
player_2 = c("b", "a", "c", "a"),
player_2_age = c(20, 10, 30, 10)
)
# dt
# player_1 player_1_age player_2 player_2_age
# 1: a 10 b 20
# 2: b 20 a 10
# 3: b 20 c 30
# 4: c 30 a 10
From the dt
above, I would like to create a data.table
with unique players and their age like the following, player_dt
:
# player_dt
# player age
# a 10
# b 20
# c 30
To do so, I've tried the code below, but it takes too long on my larger dataset, probably because I am creating a data.table
for each iteration of sapply
.
How would you get the player_dt
above, while checking for each player
that there is only one unique age
value?
# get unique players
player <- sort(unique(c(dt$player_1, dt$player_2)))
# for each player, get their age, if there is only one age value
age <- sapply(player, function(x) {
unique_values <- unique(c(
dt[player_1 == x][["player_1_age"]],
dt[player_2 == x][["player_2_age"]]))
if(length(unique_values) > 1) stop() else return(unique_values)
})
# combine to create the player_dt
player_dt <- data.table(player, age)