1

I got a table in which I would like to replace values of a single column using a lookup-table.

Using lapply this works fine. However, with a huge dataset it runs hours.

Is there a faster alternative? Thanks!

MWE:

require(data.table)

# Dummy data. 
dt_lookup <- data.table(cls_old=c(1:5), cls_new=c(5:1)) # Lookup-Table, the real data has different and non-continous entries.
dt <- data.table(cls=c(5:1), data=c(1,2,3,4,5)) # Table in which data shall be replaced depending on the lookup-table.

# Function to get the new column entry for every row based on the lookup-table.
get_new_label <- function(cls) {
  return(dt_lookup[cls_old==cls]$cls_new)
}

# Actual replacement of values.
dt <- dt[,cls:=lapply(cls, get_new_label)]
user2894356
  • 141
  • 1
  • 10
  • 3
    I think this is commonly referred to as "_update join_", so use that in your search (together with R and data.table) and ye shall find. – Henrik Aug 19 '20 at 13:18

2 Answers2

3

If I'm not misunderstanding something, you can do a simple join:

dt[dt_lookup, cls := i.cls_new, on = .(cls = cls_old)]
dt
#   cls data
#1:   1    1
#2:   2    2
#3:   3    3
#4:   4    4
#5:   5    5

You should really spend some time and study the data.table vignettes and documentation.

Roland
  • 127,288
  • 10
  • 191
  • 288
1

You can use match to realize a lookup-table.

i <- match(dt$cls, dt_lookup$cls_old)
j <- !is.na(i)
dt$cls[j] <- dt_lookup$cls_new[i[j]]
dt
#   cls data
#1:   1    1
#2:   2    2
#3:   3    3
#4:   4    4
#5:   5    5

Also have a look at fast R lookup table.

GKi
  • 37,245
  • 2
  • 26
  • 48