0

I'm working with the ToothGrowth data:

library(datasets)
data(ToothGrowth)

Here I have three columns, lenght, supplement and dosis. I want to a add a fourth column with a categorical variable that depends on the dosis amount, like for example if dosis = 0.5, then "D5", if dosis = 1, then "D1", I tried the following:

data(ToothGrowth)
df_TD <- ToothGrowth
dosiscatg <- NULL
for(i in 1:nrow(df_TD)) {
  if df_TD$dose==0.5 {
    dosiscatg <- c(dosiscatg, "D0.5")
  } else if df_TD$dose==1 {
    dosiscatg <- c(dosiscatg, "D1")
  } else if df_TD$dose==2 {
    dosiscatg <- c(dosiscatg, "D2")
  }
}

But I keep getting an error with the brackets "{}", also I don't know if that code is correct.

Balastrong
  • 4,336
  • 2
  • 12
  • 31
Begdev
  • 79
  • 1
  • 7
  • 1
    Things like this should be done on vectors as a whole instead of in a `for` loop, any tutorial that recommends anything else either has an alternative motive (teaching programming in general or showing how "inefficient R is", not true) or has low credibility as a "good practices in R". You can use something like `df_TD$newcol <- cut(ToothGrowth$dose, c(0, 0.9, 1.5, 3), labels = c("D5", "D1", "D2"))`; it's a factor, so wrap that with `as.character(.)` if you want a literal string. – r2evans Nov 07 '21 at 20:17

3 Answers3

1

Using dplyr:

library(dplyr)

df_TD <- df_TD %>%
  mutate(dosiscatg = case_when(
    dose==0.5 ~ 'D0.5',
    dose==1   ~ 'D1',
    dose==2   ~ 'D2',
    TRUE ~ NA_character_
  ))
  • 1
    While it works here, take caution: equality of floating-point works until it doesn't, and when it doesn't, it does so *silently*. Refs: https://stackoverflow.com/q/9508518, https://stackoverflow.com/q/588004, and https://en.wikipedia.org/wiki/IEEE_754 – r2evans Nov 07 '21 at 20:18
1

Here is another option using mutate and cut:

library(dplyr)    
df_TD %>%  
  dplyr::mutate(dosiscatg = cut(dose, breaks = c(0, 0.5, 1.0,2.0), labels = c("D0.5", "D1", "D2")))
    len supp dose dosiscatg
1   4.2   VC  0.5      D0.5
2  11.5   VC  0.5      D0.5
3   7.3   VC  0.5      D0.5
4   5.8   VC  0.5      D0.5
5   6.4   VC  0.5      D0.5
6  10.0   VC  0.5      D0.5
7  11.2   VC  0.5      D0.5
8  11.2   VC  0.5      D0.5
9   5.2   VC  0.5      D0.5
10  7.0   VC  0.5      D0.5
11 16.5   VC  1.0        D1
12 16.5   VC  1.0        D1
13 15.2   VC  1.0        D1
14 17.3   VC  1.0        D1
15 22.5   VC  1.0        D1
16 17.3   VC  1.0        D1
17 13.6   VC  1.0        D1
18 14.5   VC  1.0        D1
19 18.8   VC  1.0        D1
20 15.5   VC  1.0        D1
21 23.6   VC  2.0        D2
22 18.5   VC  2.0        D2
23 33.9   VC  2.0        D2
24 25.5   VC  2.0        D2
25 26.4   VC  2.0        D2
26 32.5   VC  2.0        D2
27 26.7   VC  2.0        D2
28 21.5   VC  2.0        D2
29 23.3   VC  2.0        D2
30 29.5   VC  2.0        D2
31 15.2   OJ  0.5      D0.5
32 21.5   OJ  0.5      D0.5
33 17.6   OJ  0.5      D0.5
34  9.7   OJ  0.5      D0.5
35 14.5   OJ  0.5      D0.5
36 10.0   OJ  0.5      D0.5
37  8.2   OJ  0.5      D0.5
38  9.4   OJ  0.5      D0.5
39 16.5   OJ  0.5      D0.5
40  9.7   OJ  0.5      D0.5
41 19.7   OJ  1.0        D1
42 23.3   OJ  1.0        D1
43 23.6   OJ  1.0        D1
44 26.4   OJ  1.0        D1
45 20.0   OJ  1.0        D1
46 25.2   OJ  1.0        D1
47 25.8   OJ  1.0        D1
48 21.2   OJ  1.0        D1
49 14.5   OJ  1.0        D1
50 27.3   OJ  1.0        D1
51 25.5   OJ  2.0        D2
52 26.4   OJ  2.0        D2
53 22.4   OJ  2.0        D2
54 24.5   OJ  2.0        D2
55 24.8   OJ  2.0        D2
56 30.9   OJ  2.0        D2
57 26.4   OJ  2.0        D2
58 27.3   OJ  2.0        D2
59 29.4   OJ  2.0        D2
60 23.0   OJ  2.0        D2
TarJae
  • 72,363
  • 6
  • 19
  • 66
0

Try this:

data(ToothGrowth)
df_TD <- ToothGrowth
dosiscatg <- NULL
for(i in 1:nrow(df_TD)) {
  if df_TD$dose[i]==0.5 {
    dosiscatg <- c(dosiscatg, "D0.5")
  } else if df_TD$dose[i]==1 {
    dosiscatg <- c(dosiscatg, "D1")
  } else if df_TD$dose[i]==2 {
    dosiscatg <- c(dosiscatg, "D2")
  }
}

Edit: As people have pointed out this is only correcting the syntax problems of the code but the solution alltogether is not encouraged

Dave4048
  • 173
  • 10
  • 1
    IMHO, as @r2evans says in his comment, using a for loop for this sort of thing should not be encouraged, even if it is what OP requests. By all means correct OPs syntax if appropriate, but I feel a _complete_ answer would include appropriate vectorised code and an explanation of why it is better. Benchmark tests would complete the answer. – Limey Nov 07 '21 at 20:24
  • Thank you, I'll keep it in mind next time! – Dave4048 Nov 07 '21 at 20:26