So, I have some students who wrote a google form survey. The survey is in string form where they select a drop down menu that has options like:
I do not feel sad
I feel sad some of the times
I often feel sad
I feel sad all the time
This is a type of question modeled after the beck depression inventory. Each item is associated with a 0,1,2,3, respectively.
There are some 20 of these variables.
So, I have 2 data frames.
df1 has the survey data (strings of responses). Here are 2 of those variables
head(df1[1:7,c('sad','optimism')])
sad optimism
<chr> <chr>
1 Throughout the day I sometimes feel sad I am somewhat optimistic about my future
2 Throughout the day I sometimes feel sad I am somewhat optimistic about my future
3 Throughout the day I sometimes feel happy I feel discouraged about the future
4 Throughout the day I sometimes feel happy I am optimistic about my future
5 Throughout the day I sometimes feel happy I am somewhat optimistic about my future
6 Throughout the day I sometimes feel happy I am somewhat optimistic about my future
7 Throughout the day I sometimes feel happy I feel discouraged about the future
df2 has a key of conditions
head(df2[1:4,c('sad','optimism')])
sad optimism
<chr> <chr>
1 Throughout the day I feel happy I am optimistic about my future
2 Throughout the day I sometimes feel … I am somewhat optimistic about my future
3 Throughout the day I sometimes feel … I feel discouraged about the future
4 Throughout the day I feel sad I feel the future is hopeless and that things cannot …
The variable names are the same in each dataframe.
I want to use dplyr's case_when
using pipes to take each variable from df1 and compare it to the appropriate column in df2.
The following code actually works at converting the string to a number, but if you notice the case_when conditional
checks the entire row of the dataframe, which is completely unnecessary. I want check simply the df1$sad
variable from the survey with the df2$sad
of the key.
df1 %>% mutate(across(x,~case_when(
# The following lines of code checks a given record statement
# with ALL columns. Should only check indexed column
. %in% df2[2,] ~ 0, #checks across all variables in df2; I just want to check a single column
. %in% df2[3,] ~ 1,
. %in% df2[4,] ~ 2,
. %in% df2[5,] ~ 3)))
So, some questions:
- I'm not sure case_when can do this
- If it does, I'm wondering if I need to use some dot notation
- or maybe there is a better solution
possible answers that I don't understand (yet)
[1.]: dplyr case_when This might be the best bet...Not sure how to wrap my head around it all.
[2.]: dplyr case_when Programmatically
- dplyr case_when multiple cases looks promising