1

Hi I would like to transform a long data frame into a wide df based on 3 variables (columns), resulting a total of 6 columns.

Here I have an example in long data frame as below

dat <- data.frame("time1"=c(6.28,17,49,72,56),
                  "target"=c("A","C","B","C","B"),
                  "day"=c(1,3,2,1,3))

Expected outcome:

day1.time  day1.target day2.time  day2.target day3.time  day3.target
6.28          A         49           B             17        C  
72            C         NA           NA            56        B

I'd tried the approches below using pivot_wide from tidyr package. It gave me the appended variable rather than the separated colnums as I wish. Wonder is there any suggestion to achieve the desired outcome.

dat %>%  pivot_wider(names_from = c(day,target),
              values_from = time1 ,
              values_fill = NA) 

 A tibble: 1 x 5
  `1_A` `3_C` `2_B` `1_C` `3_B`
  <dbl> <dbl> <dbl> <dbl> <dbl>
1  6.28    17    49    72    56

I had also tried to subset the data based on target / day and then using pivot_wide, as you can guess that the column of day1.target etc won't be there. The original data is huge, so I cannot even handle it with excel (manually). Will really appreciate it if there is any pointer for solving this problem. Cheers!

Hanscot
  • 77
  • 8

1 Answers1

1

The names_from should be a single column whereas values_from can be from multiple columns along with the fact that we need to consider for duplicate rows. Either do a group_by(day) %>% mutate(rn = row_number()) or use rowid(day) from data.table. So, a sequence column can be created before doing the pivot_wider

library(dplyr)
library(tidyr)
library(data.table)
dat %>% 
  mutate( rn = rowid(day)) %>% 
  pivot_wider(names_from = day, values_from = c(time1, target), 
        names_prefix = 'day', names_sep = ".") %>%
  select(-rn)
akrun
  • 874,273
  • 37
  • 540
  • 662