0

how do i record a categorical into a new categorical variable? currently, i have the variable prob_dementia with 1 (probable dementia) and 2 (no dementia) and -8 (do not know). i already changes -8 to na and filtered out na. i need to make a new variable- dementia (0) no dementia) and 1 (dementia). i have treated it as if it is a continuous variable to no avail

I attempted

data3 <- data2 %>% 
  mutate(dementia = case_when(data1$prob_dementia == 2 ~ 'no_dementia', 
                              data1$prob_dementia == 1 ~ 'dementia'))

expecting at least variable labels.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • 1) start by removing `data1$` from `case_when`; 2) Can you post sample data? Please edit the question with the output of `dput(data2)`. Or, if it is too big with the output of `dput(head(data2, 10))`. – Rui Barradas Aug 23 '23 at 18:26
  • When I use dput(data2), it shows me 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, ... – user22437104 Aug 23 '23 at 18:33
  • 2
    You can [edit](https://stackoverflow.com/posts/76964007/edit) the question to include a codeblock with that `dput` output. – Seth Aug 23 '23 at 18:37
  • Why do you have a pipe starting with `data2` but `data1` in `case_when`? Also, to make the output of `dput` smaller you can edit the question with the output of `dput(head(data2["prob_dementia"]))`. – Rui Barradas Aug 23 '23 at 18:40
  • I have removed data 1 from the case_when – user22437104 Aug 23 '23 at 18:42
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 23 '23 at 18:57

1 Answers1

0

An alternative to case_when would be to try if_else. You can see an example of this with one of the built in data sets. For example, chickwts. The variable feed is a categorical variable. If, in our example, the target is to create a new dummy variable based on the value of feed, then if_else serves our purpose. However, case_when is a better choice when there are more conditions. I've included two extra commands (i.e., glimpse and table) for you to test your work.

# package library 
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# load sample data 
head(force(chickwts))
#>   weight      feed
#> 1    179 horsebean
#> 2    160 horsebean
#> 3    136 horsebean
#> 4    227 horsebean
#> 5    217 horsebean
#> 6    168 horsebean

# inspect the sample data 
# there are two variables, feed is a categorical variable 
glimpse(chickwts)
#> Rows: 71
#> Columns: 2
#> $ weight <dbl> 179, 160, 136, 227, 217, 168, 108, 124, 143, 140, 309, 229, 181…
#> $ feed   <fct> horsebean, horsebean, horsebean, horsebean, horsebean, horsebea…

# record a categorical value to a new categorical variable
dat_updated <- chickwts %>%
  mutate(
    # if the feed is casein then d_casein is TRUE
    d_casein = if_else(
      condition = feed == "casein",
      true = TRUE,
      false = FALSE
    )
  )

# inspect with a contingency table
table(dat_updated$feed, dat_updated$d_casein)
#>            
#>             FALSE TRUE
#>   casein        0   12
#>   horsebean    10    0
#>   linseed      12    0
#>   meatmeal     11    0
#>   soybean      14    0
#>   sunflower    12    0

Created on 2023-08-23 with reprex v2.0.2

René
  • 11
  • 4
  • please provide an explanation in your answer on how this answers the question. Also you included a lot of extra code here eg. head and glimpse which is not necessary. Did you use AI to help generate this answer? – Mike Aug 23 '23 at 19:26