To start with, I define a toy example of what I assume you would like to achieve:
Given
## Toy Library
library <- data.frame(variable = c("V1", "V1", "V2", "V2", "V2"),
name = c("var name 1", "var name 1", "var name 2", "var name 2", "var name 2"),
code = c(1,2,1,2,3),
label = c("x1", "x2", "A", "B", "C"))
## Toy Data
data <- data.frame(Subj = seq(1,10),
V1 = sample(c(1,2), 10, replace = TRUE),
V2 = sample(c(1,2,3), 10, replace = TRUE))
you would like to have a new data frame data_recoded
:
data_recoded
# Subj var name 1 var name 2
# 1 1 x1 C
# 2 2 x2 C
# 3 3 x2 B
# 4 4 x2 C
# 5 5 x2 C
# 6 6 x1 B
# 7 7 x1 B
# 8 8 x1 C
# 9 9 x1 C
# 10 10 x1 A
? (Note that this new data frame contains non-syntactic variable names.)
R provides the "factor" data type to conveniently work with coded categorical data. By converting each column in your data frame to a factor, you can "recode" the codes with labels.
First, you need to extract the factor levels and labels for each variable from your library, e.g. with (using the package dplyr):
library(dplyr)
## Create dictionary of factor levels and labels from toy library
dictionary <- library %>%
group_by(variable) %>%
summarise(codes = list(code),
levels = list(label),
names = unique(name))
Next, you need a function that recodes a single data frame column, which you can later apply on all columns. For example:
## Function to replace codes with labels
recode_variable <- function(var) {
## Get column 'var' from data frame
column <- data[, var]
## Get factor levels and labels for variable 'var'
varlevels <- dictionary %>% filter(variable == var)
## Create new data frame from column as factor
df <- data.frame(name = factor(column,
levels = varlevels$codes[[1]],
labels = varlevels$levels[[1]])
)
## Replace column name with name from dictionary
names(df) <- varlevels$names
## Return data frame
return(df)
}
And finally, you can extract the names of the columns you want to recode
## Get original names of columns to be recoded
columnnames <- names(data) %>% setdiff("Subj")
and use the map_dfc()
function from the package purrr to apply the function to all selected columns and cbind the resulting data frame columns together:
library(purrr)
## Recode columns
data_recoded <- columnnames %>%
## Apply function to all variables in columnames, cbind columns to new data frame
map_dfc(~ recode_variable(.)) %>%
## Add Subject ID back as first column
mutate(Subj = data$Subj, .before = 1)
data_recoded
# Subj var name 1 var name 2
# 1 1 x1 C
# 2 2 x2 C
# 3 3 x2 B
# 4 4 x2 C
# 5 5 x2 C
# 6 6 x1 B
# 7 7 x1 B
# 8 8 x1 C
# 9 9 x1 C
# 10 10 x1 A