0

I have two datasets. They refer to the same data. However, one has string as answers to questions, and the other has the corresponding codes.

library(data.table)
dat_string <- fread("str_col1 str_col2 numerical_col
                     One   Alot          1
                     Two   Alittle       0")     

dat_codes <- fread("code_col1 code_col2 numerical_col
                     0     3    1
                     1     5    0")

I would like, to combine both datasets, so that the levels get attached to the corresponding codes as labels, (see this example) for all string columns (in dat_string).

Please note that the column names can have any format and do not necessarily have the format from the example/

What would be the easiest way to do this?

Desired outcome:

dat_codes$code_col1 <- factor(dat_codes$code_col1, levels=c("0", "1"),
labels=c("One", "Two"))    

attributes(dat_codes$code_col1)$levels
[1] "One" "Two"
Tom
  • 2,173
  • 1
  • 17
  • 44

1 Answers1

1

If I understand your edit - you are saying that both tables are the same shape, with the same row order, it is just that one has labels and one has levels. If that is the case it should be even more straightforward than my original response:

code_cols  <- which(sapply(dat_string, is.character))

for(j in code_cols) {
    set(dat_codes, j = j, value = factor(
                dat_codes[[j]], 
                levels = unique(dat_codes[[j]]),
                labels = unique(dat_string[[j]])
        )
    )
}


dat_codes
#    code_col1 code_col2 numerical_col
# 1:       One      Alot             1
# 2:       Two   Alittle             0

dat_codes$code_col1
# [1] One Two
# Levels: One Two

sapply(dat_codes, class)
# code_col1     code_col2 numerical_col
#  "factor"      "factor"     "integer"
SamR
  • 8,826
  • 3
  • 11
  • 33
  • Hey SamR, thank you for your answer. I noticed that I made mistake in my question because I confused factor value/levels/labels. I am going to update my question. My apologies for the confusion. – Tom Oct 12 '22 at 10:38
  • I made some updates to the question if you are still interested. – Tom Oct 12 '22 at 10:46
  • 1
    @Tom I've updated my reply in response to your edit – SamR Oct 12 '22 at 13:19
  • Hi Sam, do you perhaps know of any way to allow for `NA`'s in the levels (it appears a couple of my columns have them)? I have tried to wrap an `addNA()` around `unique(SME_codes[[j]])` as in `levels = addNA(unique(SME_codes[[j]]))`, but that does not work. – Tom Oct 14 '22 at 08:43
  • 1
    I don't have access to a PC today but I think if you make the `labels` argument `na.omit(unique(dat_string[[j]]))` it should work. – SamR Oct 14 '22 at 10:18
  • 1
    I fixed it with `labels = unique(dat_codes[[j]][which(!is.na(dat_codes[[j]]))])`, in order to remove the NA – Tom Oct 14 '22 at 12:45