Suppose that I have the following data
df <- structure(list(car_model = c(301, 302, 303, 304), colour = c(501,
502, 503, 504), sales = c(182, 191, 302, 101)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
and I have a single lookup table where I will get the texts to replace the codes in the columns car_model
and colour
.
tbl1 <- structure(list(txt = c("A", "B", "C", "Y"), cod = c(301, 302,
303, 304), var = c("car_model", "car_model", "car_model", "car_model"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
tbl2 <- structure(list(txt = c("black", "green", "red", "white"), cod = c(501,
502, 503, 504), var = c("colour", "colour", "colour", "colour"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
combining the two tables I have
tbl <- rbind(tbl1,tbl2)
# A tibble: 8 x 3
txt cod var
<chr> <dbl> <chr>
1 A 301 car_model
2 B 302 car_model
3 C 303 car_model
4 Y 304 car_model
5 black 501 colour
6 green 502 colour
7 red 503 colour
8 white 504 colour
Is there a way to replace all the columns in the main df
using the lookup table this way (matching the column names by the value in the columns var
and cod
) or I need to make separate tables, one for each variable? Another doubt that I have is if it's reasonable to do it in a dataset with ~10 million rows, 30 or more variables and a lookup table with total size ~ 5 thousand rows.
EDIT: About the codes is possible to have the same code in different variables.
EDIT2: I'm looking for a fast and memory efficient solution. Maybe some solution with data.table