I am processing the results of a questionnaire which repeats the same set of questions but each time on a different topic. In other words, the dataset contains subsets of variables that topic wise are related. I want to present these results in a flat contingency table. This requires me to transform my data from wide to long format. The problem I face is how to instruct R that these relationships exists when converting data from wide to long format.
The code below relates to four questions answered by five respondents and illustrates my questionnaire.
ID <- c(1, 2, 3, 4, 5)
Aq1 <- c("Yes", "Yes", "Yes", "Yes", "No")
Aq2 <- c("Win", "Lose", "Lose", "Lose", "Win")
Bq1 <- c("No", "No", "No", "No", "Yes")
Bq2 <- c("Lose", "Lose", "Win", "Win", "Win")
The questionnaire contains two topics (A and B). For each topic the same two questions are being asked (q1 and q2). I create a dataframe.
df <- data.frame(ID, Aq1, Aq2, Bq1, Bq2)
From this dataframe I wish to create the following table:
A B
Yes No Yes No
Win 1 1 1 2
Lose 3 0 0 2
I plan to create a flat contingency table using ftable(
). This requires me to change the structure of the dataframe from wide to the following long format.
ID Topic q1 q2
1 A Yes Win
1 B No Lose
2 A Yes Lose
2 B No Lose
etc.
Calling on the reshape2
and dplyr
packages I use:
df_long <- melt(df, id.vars = c("ID", "Aq2", "Bq2"), value.name = "q1") # from reshape2-package
Notice the warning message:
"attributes are not identical across measure variables; they will be dropped"
df_long$Topic <- substr(df_long$variable, start = 1, stop = 1) # creating a vectors with topics A and B
df_long$q2 <- c(Aq2, Bq2) # manually constructing "q2"
df_long <- df_long[,-c(2:4)] # ridding the original vectors "Aq2" and "Bq2"
df_long <- df_long[, c(1,3,2,4)] # changing order of columns
arrange(df_long, ID) # from dplyr-package, changing order of rows
df_long <- as.data.frame(unclass(df_long)) # converting all dataframe characters to factors
df_long$q1 <- factor(df_long$q1, levels = c("Yes", "No")) # reordering factor levels of "q1"
df_long$q2 <- factor(df_long$q2, levels = c("Win", "Lose")) # reordering factor levels of "q2"
This allows me to use ftable()
and results in the table I want.
ftable(df_long, row.vars = c("q2"), col.vars = c("Topic", "q1"))
I have the impression that there should be easier ways to code this. What is a less elaborate, more automated and faster way to code this in R?