1

I am not able to convert data from long to wide format following the examples in this portal (R how to convert from long to wide format, Converting long format to wide format, converting a long-formated dataframe to wide format tidyverse). I am not sure what am I missing. I am trying to transform the long data frame to a wide format as shown below:

enter image description here

library(tidyr)

Y  <- c("A","A","A","A","A","A","B","B","B","C","C","C","C","C","C","C","C","D","D","D")  
Z <- c("ABC","BCD","CDE","DEF","EFG","FGH","A12","B12","C12","A45","B45","C45","D45","E45","F45","G45","H45","X66","Y66","Z66")

df <- as.data.frame(cbind(Y,Z))

data_wide <- spread(df, Y, Z)

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 20 rows:
* 1, 2, 3, 4, 5, 6
* 7, 8, 9
* 10, 11, 12, 13, 14, 15, 16, 17
* 18, 19, 20


library(tidyverse)

data_wide <- pivot_wider(df, names_from = Y, values_from = Z, values_fill = "")

Error: Can't convert <character> to <list>.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
RanonKahn
  • 853
  • 10
  • 34

1 Answers1

2

That error is based on duplicates. We need a unique sequence id

library(dplyr)
library(tidyr)
library(data.table)
df %>% 
     mutate(rn = rowid(Y)) %>% 
     spread(Y, Z) %>%
     select(-rn)

-output

 A    B   C    D
1  ABC  A12 A45  X66
2  BCD  B12 B45  Y66
3  CDE  C12 C45  Z66
4  DEF <NA> D45 <NA>
5  EFG <NA> E45 <NA>
6  FGH <NA> F45 <NA>
7 <NA> <NA> G45 <NA>
8 <NA> <NA> H45 <NA>

rowid is from data.table which is a compact way to create a sequence id. If we want to use dplyr, then use row_number() after group_by. Also, spread is deprecated in favor of pivot_wider

df %>%
    group_by(Y) %>%
    mutate(rn = row_number()) %>%
    ungroup %>%
    pivot_wider(names_from = Y, values_from = Z) %>%
    select(-rn)

-ouput

# A tibble: 8 x 4
  A     B     C     D    
  <chr> <chr> <chr> <chr>
1 ABC   A12   A45   X66  
2 BCD   B12   B45   Y66  
3 CDE   C12   C45   Z66  
4 DEF   <NA>  D45   <NA> 
5 EFG   <NA>  E45   <NA> 
6 FGH   <NA>  F45   <NA> 
7 <NA>  <NA>  G45   <NA> 
8 <NA>  <NA>  H45   <NA> 
akrun
  • 874,273
  • 37
  • 540
  • 662