1

I'm struggling to get my data (xlsx file) in the correct shape. My original database is as follows:

   patient when    age weight height watchID dateFrom           
   <chr>   <chr> <dbl> <dbl>   <dbl>   <dbl> <dttm>             
 1 T01     pre      82 83        174    2788 2017-07-24
 2 T02     pre      81 80        166    7309 2017-07-22 
 3 T02     post     67 91        163    7309 2017-10-26 
 4 T03     pre      68 91        172    5066 2017-07-26 
 5 T03     post     68 91        172    7220 2017-10-24 

I want to get a wide database in which the there is only one patient ID based on the "when" column. But when I try to reshape it I finally get to this with the "dcast" function:

   patient age_post age_pre weight_post weight_pre height_post height_pre
   <chr>      <int>   <int>       <int>      <int>       <int>      <int>
 1 T01            0       1           0          1           0          1
 2 T02            1       1           1          1           1          1
 3 T03            1       1           1          1           1          1
 4 T04            0       1           0          1           0          1
 5 T05            1       0           1          0           1          0

Somehow it changes all the variables to 1 and 0. How do I get a similar database with different variable types with "pre" and "post" attached to the original columns?

This is my code ("HW" is the original dataset mentioned above):

mdata <- melt(HW, id=c("patient","when"))
mdata$value <- as.numeric(as.character(mdata$value)) #I added this line to convert the column to numeric but it doesn't help
mdata2 <- dcast(mdata, patient~variable+when)

I also tried it with:

mdata <- melt(HW, id=c("patient","when"))
mdata3 <- reshape(mdata, idvar='patient', timevar='when', direction='wide')

But then I get this:

   patient variable.pre value.pre variable.post value.post
   <chr>   <fct>        <chr>     <fct>         <chr>     
 1 T01     age          82        NA            NA        
 2 T02     age          81        age           67        
 3 T03     age          68        age           68        
 4 T04     age          81        NA            NA        
 5 T05     NA           NA        age           87 

Without the other variables.

Thanks in advance.

Ztarrk
  • 13
  • 2

1 Answers1

0

Does this do what you want?

library(tidyr)
df <- tibble(patient = c("T01","T02","T02","T03","T03"),
             when = c("pre","pre","post","pre","post"),
             age = c(82,81,67,68,68),
             weight = c(83,80,91,91,91),
             height = c(174,166,163,172,172),
             watchid = c(2788,7309,7309,5066,7220),
             datefrom = c("2017-07-24","2017-07-22","2017-10-26",
                          "2017-07-26","2017-10-24"))

df %>%
  pivot_wider(names_from = when,
              values_from = c(age,weight,height,watchid,datefrom))

A tibble: 3 x 11
  patient age_pre age_post weight_pre weight_post height_pre height_post watchid_pre watchid_post
  <chr>     <dbl>    <dbl>      <dbl>       <dbl>      <dbl>       <dbl>       <dbl>        <dbl>
1 T01          82       NA         83          NA        174          NA        2788           NA
2 T02          81       67         80          91        166         163        7309         7309
3 T03          68       68         91          91        172         172        5066         7220
flafont11
  • 137
  • 7
  • this looks like what I need. If I use your code exactly it works, however, when I try it with my database which has 75 columns it just discards the "when" column without doing anything else. – Ztarrk Jul 28 '20 at 09:58
  • This is because you need to specify all the columns that have values linked to the variable "when" so that they are modified. However you should not need to type all the column names. this Should do the trick. ``` cols <- names(df[,3:75]) # change the numbers according to the cols you need df %>% pivot_wider(names_from = when, values_from = cols) ``` – flafont11 Jul 28 '20 at 10:06