0

I have a dataframe where I tried to assign variables "I" for each unique value in the "id" column using the code below. However, the assigned values are treated as characters instead of variables. How can I convert them into variables? I am looking for a more efficient approach as my actual dataset has a larger number of individuals, making manual assignment impractical.

df <- data.frame(
  id = c("a", "a", "a", "b", "b", "b", "c", "c", "c", "c")
)
I1 <- 1
I2 <- 2
I3 <- 3
df2 <- df %>% 
  dplyr::mutate(I = paste0("I", as.integer(factor(id))))

df2
   id  I
1   a I1
2   a I1
3   a I1
4   b I2
5   b I2
6   b I2
7   c I3
8   c I3
9   c I3
10  c I3

One workaround I found is the code below, but I believe there should be a more efficient solution.

df3 <- df2 %>% 
  dplyr::mutate(I = case_when(
    id == "a" ~ I1,
    id == "b" ~ I2,
    id == "c" ~ I3
  ))
df3
   id I
1   a 1
2   a 1
3   a 1
4   b 2
5   b 2
6   b 2
7   c 3
8   c 3
9   c 3
10  c 3

I would appreciate any idea to do this effectively. Thank you.

TKH_9
  • 105
  • 1
  • 7
  • How is your solution inefficient? – joran Jul 04 '23 at 00:52
  • @joran, I have data with over 100 individuals, which means I have to assign the variables with over 100 lines of ```id == "x" ~ Ix`` if I use my current method. I thought there should be another way to do this because proggramming is good at doing repetitive stuff. – TKH_9 Jul 04 '23 at 00:59
  • So then it sounds like the inefficient piece is that you've stored related information (all the `I1`, `I2` variables) in separate data structures, rather than, say in a named list, where the names correspond to the appropriate id values. Then you could probably do what you want in one line via some clever indexing. – joran Jul 04 '23 at 01:07
  • If you have your "I" values in a data.frame, i.e. a "lookup table", you can use any of the ~20 methods listed here: https://stackoverflow.com/questions/67081496/canonical-tidyverse-method-to-update-some-values-of-a-vector-from-a-look-up-tabl – jared_mamrot Jul 04 '23 at 01:10
  • @jared_mamrot, Thank you. this is what I wanted. – TKH_9 Jul 04 '23 at 05:44

2 Answers2

2

I would agree with @joran , the inefficient part is that you have stored all these values in separate variables (I1, I2, I3). Store them together in any data-structure (eg - vector, dataframe) and there are lots of easy solution.

One of which is to use match.

library(dplyr)

vec <- c(1, 2, 3)

df %>% mutate(I = vec[match(id, unique(id))])

#   id I
#1   a 1
#2   a 1
#3   a 1
#4   b 2
#5   b 2
#6   b 2
#7   c 3
#8   c 3
#9   c 3
#10  c 3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

alternatively please try the data.table::rleid approach

df <- data.frame(
  id = c("a", "a", "a", "b", "b", "b", "c", "c", "c", "c")
) %>% mutate(i=paste0('I',data.table::rleid(id)))

Created on 2023-07-04 with reprex v2.0.2

   id  i
1   a I1
2   a I1
3   a I1
4   b I2
5   b I2
6   b I2
7   c I3
8   c I3
9   c I3
10  c I3

jkatam
  • 2,691
  • 1
  • 4
  • 12