2

I am processing the results of a questionnaire which repeats the same set of questions but each time on a different topic. In other words, the dataset contains subsets of variables that topic wise are related. I want to present these results in a flat contingency table. This requires me to transform my data from wide to long format. The problem I face is how to instruct R that these relationships exists when converting data from wide to long format.

The code below relates to four questions answered by five respondents and illustrates my questionnaire.

ID  <- c(1,      2,      3,      4,      5)
Aq1 <- c("Yes",  "Yes",  "Yes",  "Yes",  "No")
Aq2 <- c("Win",  "Lose", "Lose", "Lose", "Win")
Bq1 <- c("No",   "No",   "No",   "No",   "Yes")
Bq2 <- c("Lose", "Lose", "Win",  "Win",  "Win")

The questionnaire contains two topics (A and B). For each topic the same two questions are being asked (q1 and q2). I create a dataframe.

df <- data.frame(ID, Aq1, Aq2, Bq1, Bq2)

From this dataframe I wish to create the following table:

           A        B
           Yes No   Yes No
 Win       1   1    1   2
 Lose      3   0    0   2

I plan to create a flat contingency table using ftable(). This requires me to change the structure of the dataframe from wide to the following long format.

ID Topic  q1     q2
1  A      Yes    Win
1  B      No     Lose
2  A      Yes    Lose
2  B      No     Lose

etc.

Calling on the reshape2 and dplyr packages I use:

df_long <- melt(df, id.vars = c("ID", "Aq2", "Bq2"), value.name = "q1") # from reshape2-package

Notice the warning message:

"attributes are not identical across measure variables; they will be dropped"

df_long$Topic <- substr(df_long$variable, start = 1, stop = 1) # creating a vectors with topics A and B
df_long$q2 <- c(Aq2, Bq2) # manually constructing "q2"
df_long <- df_long[,-c(2:4)] # ridding the original vectors "Aq2" and "Bq2"
df_long <- df_long[, c(1,3,2,4)] # changing order of columns
arrange(df_long, ID) # from dplyr-package, changing order of rows
df_long <- as.data.frame(unclass(df_long)) # converting all dataframe characters to factors
df_long$q1 <- factor(df_long$q1, levels = c("Yes", "No")) # reordering factor levels of "q1"
df_long$q2 <- factor(df_long$q2, levels = c("Win", "Lose"))  # reordering factor levels of "q2"

This allows me to use ftable() and results in the table I want.

ftable(df_long, row.vars = c("q2"), col.vars = c("Topic", "q1"))

I have the impression that there should be easier ways to code this. What is a less elaborate, more automated and faster way to code this in R?

Arthur Yip
  • 5,810
  • 2
  • 31
  • 50
Arie
  • 55
  • 1
  • 5

2 Answers2

3

You could do the same thing with dplyr and tidyr packages:

df %>%
  gather(key, value, -ID) %>%
  mutate(q = substr(key,2,3),
         Topic = substr(key,1,1)) %>%
  select(-key) %>%
  spread(q,value)

gives

   ID Topic  q1   q2
1   1     A Yes  Win
2   1     B  No Lose
3   2     A Yes Lose
4   2     B  No Lose
5   3     A Yes Lose
6   3     B  No  Win
7   4     A Yes Lose
8   4     B  No  Win
9   5     A  No  Win
10  5     B Yes  Win

add the following:

%>% 
  select(q2,q1,Topic) %>%
  table() 

to get:

, , Topic = A

      q1
q2     No Yes
  Lose  0   3
  Win   1   1

, , Topic = B

      q1
q2     No Yes
  Lose  2   0
  Win   2   1
Wietze314
  • 5,942
  • 2
  • 21
  • 40
2

Edit: as promised, tidyr 1.0.0 has replaced gather and spread with pivot_longer. Now the solution to this question is one line, once you figure out the appropriate arguments to pivot_longer as I did here:

ID  <- c(1,      2,      3,      4,      5)
Aq1 <- c("Yes",  "Yes",  "Yes",  "Yes",  "No")
Aq2 <- c("Win",  "Lose", "Lose", "Lose", "Win")
Bq1 <- c("No",   "No",   "No",   "No",   "Yes")
Bq2 <- c("Lose", "Lose", "Win",  "Win",  "Win")
df <- data.frame(ID, Aq1, Aq2, Bq1, Bq2)

library(tidyverse)
(df_for_ftable <- df %>% 
    pivot_longer(Aq1:Bq2, names_to = c("Topic", ".value"), names_pattern = "(.)(..)"))
#> # A tibble: 10 x 4
#>       ID Topic q1    q2   
#>    <dbl> <chr> <fct> <fct>
#>  1     1 A     Yes   Win  
#>  2     1 B     No    Lose 
#>  3     2 A     Yes   Lose 
#>  4     2 B     No    Lose 
#>  5     3 A     Yes   Lose 
#>  6     3 B     No    Win  
#>  7     4 A     Yes   Lose 
#>  8     4 B     No    Win  
#>  9     5 A     No    Win  
#> 10     5 B     Yes   Win
(df_for_ftable %>% ftable(row.vars = c("q2"), col.vars = c("Topic", "q1")))
#>      Topic  A      B    
#>      q1    No Yes No Yes
#> q2                      
#> Lose        0   3  2   0
#> Win         1   1  2   1

Created on 2019-09-18 by the reprex package (v0.3.0)


Older answer: tidyr's gather and spread, with help from separate, will do the trick. There are two newer functions in development, pivot_wider and pivot_longer, and they have better syntax, but we'll work with gather and spread for now.

ID  <- c(1,      2,      3,      4,      5)
Aq1 <- c("Yes",  "Yes",  "Yes",  "Yes",  "No")
Aq2 <- c("Win",  "Lose", "Lose", "Lose", "Win")
Bq1 <- c("No",   "No",   "No",   "No",   "Yes")
Bq2 <- c("Lose", "Lose", "Win",  "Win",  "Win")
df <- data.frame(ID, Aq1, Aq2, Bq1, Bq2)

library(tidyverse)
(df_for_ftable <- df %>% 
  gather(key = "topic_and_q", value = "ans", -ID) %>%
  separate("topic_and_q", c("Topic", "q"), 1) %>%
  spread(q, ans))
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
#>    ID Topic  q1   q2
#> 1   1     A Yes  Win
#> 2   1     B  No Lose
#> 3   2     A Yes Lose
#> 4   2     B  No Lose
#> 5   3     A Yes Lose
#> 6   3     B  No  Win
#> 7   4     A Yes Lose
#> 8   4     B  No  Win
#> 9   5     A  No  Win
#> 10  5     B Yes  Win

df_for_ftable %>% ftable(row.vars = c("q2"), col.vars = c("Topic", "q1"))
#>      Topic  A      B    
#>      q1    No Yes No Yes
#> q2                      
#> Lose        0   3  2   0
#> Win         1   1  2   1

Created on 2019-09-02 by the reprex package (v0.3.0)

Arthur Yip
  • 5,810
  • 2
  • 31
  • 50