0

I'm trying to turn a dataframe from long to wide format. Currently the df is set up like so:

dput(head(df,10))
structure(list(TECH_ID = c("14050154", "14050154", "13835650", 
"13835650", "13469601", "13469601", "13782883", "13782883", "12342837", 
"12342837"), MNSCU_QUES = c("What language did you learn to speak first?", 
"Which language do you speak most often at home?", "What language did you learn to speak first?", 
"Which language do you speak most often at home?", "What language did you learn to speak first?", 
"Which language do you speak most often at home?", "What language did you learn to speak first?", 
"Which language do you speak most often at home?", "What language did you learn to speak first?", 
"Which language do you speak most often at home?"), MNSCU_RESP = c("English and another language", 
"Another", "English only", "English", "English only", "English", 
"English and another language", "English", "English only", "English"
)), row.names = c(NA, 10L), class = "data.frame")

I am trying to set up the dataframe so it reads like this:

enter image description here

I've been trying to use this code here:

df_wide <- dcast(df, TECH_ID+MNSCU_RESP~MNSCU_QUES)

But the resulting dataframe looks like this:

Code:

dput(head(df_wide,10))
structure(list(TECH_ID = c("00007179", "00007179", "00008201", 
"00008201", "00020900", "00020900", "00021757", "00021757", "00031227", 
"00031227"), MNSCU_RESP = c("English", "English only", "English", 
"English only", "English", "English only", "English", "English only", 
"English", "English only"), `What language did you learn to speak first?` = c(0L, 
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L), `Which language do you speak most often at home?` = c(1L, 
0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L)), row.names = c(NA, 10L), class = "data.frame")

Visual: enter image description here

alexvc
  • 67
  • 7
  • `tidyr::pivot_wider(df, names_from = MNSCU_QUES, values_from = MNSCU_RESP)` – Ronak Shah Oct 29 '20 at 15:09
  • @RonakShah I get the following error: Warning message: Values in `MNSCU_RESP` are not uniquely identified; output will contain list-cols. * Use `values_fn = list(MNSCU_RESP = list)` to suppress this warning. * Use `values_fn = list(MNSCU_RESP = length)` to identify where the duplicates arise * Use `values_fn = list(MNSCU_RESP = summary_fun)` to summarise duplicates – alexvc Oct 29 '20 at 16:34
  • That doesn't happen for the data shared but you can try this answer https://stackoverflow.com/questions/58837773/pivot-wider-issue-values-in-values-from-are-not-uniquely-identified-output-w/ – Ronak Shah Oct 29 '20 at 22:01

1 Answers1

0
library(reshape2)

df <- structure(list(TECH_ID = c("14050154", "14050154", "13835650", 
                           "13835650", "13469601", "13469601", "13782883", "13782883", "12342837", 
                           "12342837"), MNSCU_QUES = c("What language did you learn to speak first?", 
                                                       "Which language do you speak most often at home?", "What language did you learn to speak first?", 
                                                       "Which language do you speak most often at home?", "What language did you learn to speak first?", 
                                                       "Which language do you speak most often at home?", "What language did you learn to speak first?", 
                                                       "Which language do you speak most often at home?", "What language did you learn to speak first?", 
                                                       "Which language do you speak most often at home?"), MNSCU_RESP = c("English and another language", 
                                                                                                                          "Another", "English only", "English", "English only", "English", 
                                                                                                                          "English and another language", "English", "English only", "English"
                                                       )), row.names = c(NA, 10L), class = "data.frame")

df_wide <- reshape2::dcast(df, TECH_ID~MNSCU_QUES, value.var = "MNSCU_RESP")

> df_wide
   TECH_ID What language did you learn to speak first? Which language do you speak most often at home?
1 12342837                                English only                                         English
2 13469601                                English only                                         English
3 13782883                English and another language                                         English
4 13835650                                English only                                         English
5 14050154                English and another language                                         Another
ismirsehregal
  • 30,045
  • 5
  • 31
  • 78
  • I get the following error: Aggregation function missing: defaulting to length and the output looks like this: TECH_ID What language did you learn to speak first? Which language do you speak most often at home? 7179, 1,1 – alexvc Oct 29 '20 at 16:36