-4

R: Reshape data from long to wide, including a date

(I was trying to include data, but the formatting was so horrible that I gave up).

I have a question on reshaping longitudinal data. The trick is that I have multiple rows per subject ('row_num' ranges from 1 to 8).

I've looked at over 20 Stack Overflow pages, some other pages, and Hadley's posts (and articles) on reshaping, and there seems to be a gap I can't find:

I have got multiple rows per subject, and can't find out how to turn that into one row. The trick is that I need to take (say) Q1,...,Q5, and turn it into two or more blocks of questions (Q1.1,...,Q5.1,Q1.2,...Q5.2, ...)

Using one terminology, 'row_num' is a within-subject variable, and everything seems to be based on between-subject variables.

It puzzles and frustrates me, since IMHO the classic longitudinal to wide situation = multiple rows per subject (with one block of variables) to one row per subject (with multiple blocks of variables).

Could somebody direct me to some answers or tutorials which would cover this?

Thank you very much!

Upon edit, here is what a dput of the original data frame would look like:

# A tibble: 6 x 9
  Subject_ID row_num Date          Q1 Q2_text       Q3    Q4    Q5 Q6_text      
       <dbl>   <dbl> <date>     <dbl> <chr>      <dbl> <dbl> <dbl> <chr>        
1          1       1 2019-01-01     4 Because        5     5     1 and so on    
2          1       2 2019-01-02     1 O, bother      5     4     1 NA           
3          1       3 2019-01-03     2 NA             3     4    NA NA           
4          2       1 2018-12-04    NA NA             1     4     1 NA           
5          3       1 2018-12-15     3 In addtion     5    NA     3 NA           
6          3       2 2018-12-26     1 NA             4     3     2 in conclusion
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
Barry DeCicco
  • 251
  • 1
  • 7
  • Without sample data it's hard to understand what you are asking, but is it possible what you want is not reshaping, but grouping? (Since you say 'I have got multiple rows per subject, and can't find out how to turn that into one row.') – anotherfred Jan 19 '19 at 18:44
  • That sounds like a path to examine. I'll check that out! – Barry DeCicco Jan 19 '19 at 19:00

2 Answers2

0

We could help if you provided some sample data. The web is full of tutorials for using the following functions that do what you are interested in:

gather() and spread() from the tidyr package

melt() and dcast() from the reshape2 package

Claudiu Papasteri
  • 2,469
  • 1
  • 17
  • 30
  • Certainly! I'm trying to figure out how to post data, in a way which won't have a mangled format. – Barry DeCicco Jan 19 '19 at 18:59
  • Is this good? structure(list(Subject_ID = c(1, 1, 1, 2, 3, 3), row_num = c(1, 2, 3, 1, 1, 2), Date = structure(c(17897, 17898, 17899, 17869, 17880, 17891), class = "Date"), Q1 = c(4, 1, 2, NA, 3, 1), Q2_text = c("Because", "O, bother", NA, NA, "In addtion", NA), Q3 = c(5, 5, 3, 1, 5, 4), Q4 = c(5, 4, 4, 4, NA, 3), Q5 = c(1, 1, NA, 1, 3, 2), Q6_text = c("and so on", NA, NA, NA, NA, "in conclusion")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame") – Barry DeCicco Jan 19 '19 at 19:19
  • use `dput()` on your data if you can. – Claudiu Papasteri Jan 19 '19 at 20:44
  • I reread the example and have added the results of printing the dput() command to the original question. BTW, I've poured over the reshape2 package, and a large number of tutorials on the reshape (and reshape2) packages. The thing which repeatedly hits me is that the examples seem to not fit what I'm trying to do (and it surprises me quite a bit!). They really come down to each row stays as a row (after melting and recasting. I need to have multiple rows combined into one row, with each set of measurements from a row forming a block of measurements in the new row. – Barry DeCicco Jan 19 '19 at 21:21
0

Here's an approach using tidyr.

library(tidyr)
df2 <- df %>%
  # (optional) First convert all the data columns to text so 
  #   they're readable throughout the process.
  mutate_at(vars(Date:Q6_text), as.character) %>%

  # Gather into long format, where we record the column it came from 
  #   as "question" and the value it held as "value"
  gather(question, value, -Subject_ID, -row_num) %>%

  # Combine the row_num and question into a new column
  unite("question2", c("row_num", "question")) %>%

  # Use that new column to spread everything out
  spread(question2, value)


> df2
  Subject_ID     1_Date 1_Q1  1_Q2_text 1_Q3 1_Q4 1_Q5 1_Q6_text     2_Date 2_Q1 2_Q2_text 2_Q3 2_Q4 2_Q5     2_Q6_text     3_Date 3_Q1 3_Q2_text 3_Q3 3_Q4 3_Q5 3_Q6_text
1          1 2019-01-01    4    Because    5    5    1 and so on 2019-01-02    1 O, bother    5    4    1          <NA> 2019-01-03    2      <NA>    3    4 <NA>      <NA>
2          2 2018-12-04 <NA>       <NA>    1    4    1      <NA>       <NA> <NA>      <NA> <NA> <NA> <NA>          <NA>       <NA> <NA>      <NA> <NA> <NA> <NA>      <NA>
3          3 2018-12-15    3 In addtion    5 <NA>    3      <NA> 2018-12-26    1      <NA>    4    3    2 in conclusion       <NA> <NA>      <NA> <NA> <NA> <NA>      <NA>
Jon Spring
  • 55,165
  • 4
  • 35
  • 53