0

I have longitudinal data in a wide format that I am trying to convert to long format:

#I have a wide data which look like this:
dat_wide <- read.table(text="
cid dyad f1 f2  op2 ed1 junk 
1   2    0  0   4   5   0.876
1   5    0  1   4   4   0.765
", header=TRUE)
#I want to convert it to long like this:
dat_long <- read.table(text="
cid dyad f op  ed junk  Visit
1   2    0 NA  5  0.876 1
1   2    0 4   NA 0.876 2
1   5    0 NA  4  0.765 1
1   5    1 4   NA 0.765 2 
", header=TRUE)

#R code I was trying:
dat_l2 = reshape(dat_wide,idvar='cid',  varying=list(c('f1','f2'), 'op2','ed1'),
                 #timevar='Visit',
                 times=c(1,2),
                 v.names=c('f','op','ed'),
                 direction='long') 
#gives error:Error in reshape(merge_wide1, idvar = "cid", varying = c("f1",  : length of 'v.names' does not evenly divide length of 'varying'

it's similar to Converting data from wide to long (using multiple columns)

The difference in my data is: I have some variables that were only recorded for fewer time points. eg.variable 'f' was recorded from both time 1 & time 2 but variable 'op' was recorded only for time2 (i.e. op2)& variable 'ed' was recorded only for time1 (i.e.ed1) head(data)

nnet
  • 39
  • 4

1 Answers1

1

You can use pivot_longer from tidyr :

tidyr::pivot_longer(dat_wide, 
                    cols = f1:ed1, 
                    names_to = c('.value', 'Visit'), 
                    names_pattern = '(.*)(\\d+)')

#    cid  dyad  junk Visit     f    op    ed
#  <int> <int> <dbl> <chr> <int> <int> <int>
#1     1     2 0.876 1         0    NA     5
#2     1     2 0.876 2         0     4    NA
#3     1     5 0.765 1         0    NA     4
#4     1     5 0.765 2         1     4    NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This solves what I'hv asked here. Thank you Ronak! however, somehow it does not work for the bigger data I have. Gives error: "Error in `[.data.frame`(data, unique(names(data))) : undefined columns selected" I know it would be difficult without looking at actual data but any solution? – nnet Nov 12 '20 at 06:44
  • @nnet The `names_pattern` is important here. You need to make sure that the column names follow the same pattern as shown in the example, if they are not we may need to change our regex. – Ronak Shah Nov 12 '20 at 08:52
  • Try to use `names_pattern` as `names_pattern = '(.*?)(\\d+)'`. Can you provide a reproducible example with your real data? – Ronak Shah Nov 12 '20 at 23:43
  • If you can share the names of the data that should give lot of clue. `dput(names(dataframe_name))` – Ronak Shah Nov 14 '20 at 00:34
  • I can see that you have `NA` as column name. Can you remove that column and then check if the answer works for you. – Ronak Shah Nov 18 '20 at 03:31
  • @Ronak your solutions always amaze me. Can you please explain meaning of \\d+ here? – AnilGoyal Nov 18 '20 at 05:04
  • Thank you Anil. The `(\\d+)` part is to extract the number from the column names. – Ronak Shah Nov 18 '20 at 05:11