Wide format dataframe to long format dataframe using R

Question

I am not able to convert data from long to wide format following the examples in this portal (R how to convert from long to wide format, Converting long format to wide format, converting a long-formated dataframe to wide format tidyverse). I am not sure what am I missing. I am trying to transform the long data frame to a wide format as shown below:

library(tidyr)

Y  <- c("A","A","A","A","A","A","B","B","B","C","C","C","C","C","C","C","C","D","D","D")  
Z <- c("ABC","BCD","CDE","DEF","EFG","FGH","A12","B12","C12","A45","B45","C45","D45","E45","F45","G45","H45","X66","Y66","Z66")

df <- as.data.frame(cbind(Y,Z))

data_wide <- spread(df, Y, Z)

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 20 rows:
* 1, 2, 3, 4, 5, 6
* 7, 8, 9
* 10, 11, 12, 13, 14, 15, 16, 17
* 18, 19, 20


library(tidyverse)

data_wide <- pivot_wider(df, names_from = Y, values_from = Z, values_fill = "")

Error: Can't convert <character> to <list>.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates

akrun · Accepted Answer · 2021-08-14T00:56:02.027

2

That error is based on duplicates. We need a unique sequence id

library(dplyr)
library(tidyr)
library(data.table)
df %>% 
     mutate(rn = rowid(Y)) %>% 
     spread(Y, Z) %>%
     select(-rn)

-output

 A    B   C    D
1  ABC  A12 A45  X66
2  BCD  B12 B45  Y66
3  CDE  C12 C45  Z66
4  DEF <NA> D45 <NA>
5  EFG <NA> E45 <NA>
6  FGH <NA> F45 <NA>
7 <NA> <NA> G45 <NA>
8 <NA> <NA> H45 <NA>

rowid is from data.table which is a compact way to create a sequence id. If we want to use dplyr, then use row_number() after group_by. Also, spread is deprecated in favor of pivot_wider

df %>%
    group_by(Y) %>%
    mutate(rn = row_number()) %>%
    ungroup %>%
    pivot_wider(names_from = Y, values_from = Z) %>%
    select(-rn)

-ouput

# A tibble: 8 x 4
  A     B     C     D    
  <chr> <chr> <chr> <chr>
1 ABC   A12   A45   X66  
2 BCD   B12   B45   Y66  
3 CDE   C12   C45   Z66  
4 DEF   <NA>  D45   <NA> 
5 EFG   <NA>  E45   <NA> 
6 FGH   <NA>  F45   <NA> 
7 <NA>  <NA>  G45   <NA> 
8 <NA>  <NA>  H45   <NA>

edited Aug 14 '21 at 00:56

answered Aug 14 '21 at 00:53

akrun

874,273
37
540
662

That was so quick. – RanonKahn Aug 14 '21 at 00:53
May you also please post the previous solution you posted before the above solution? – RanonKahn Aug 14 '21 at 00:55
@RanonKahn I have both updates in the same post. Is that what you meant – akrun Aug 14 '21 at 00:56
There was a oneliner – RanonKahn Aug 14 '21 at 00:59
@RanonKahn it is the same thing :=) I just `split` into multiple lines `df %>% mutate(rn = rowid(Y)) %>% spread(Y, Z)` – akrun Aug 14 '21 at 01:00
1

Oh! I realize that now. Thanks a lot, if I am allowed to mention it. – RanonKahn Aug 14 '21 at 01:01
three years back, I think saw your R tutorial with a lot of tips for data wrangling. If the tutorial webpages are still available, can you please provide the link? – RanonKahn Aug 14 '21 at 05:25
@RanonKahn I don't remember any tutorial webpage I created though – akrun Aug 14 '21 at 17:59
Hi @akrun, can you please take a look at this query [https://stackoverflow.com/questions/69007210/how-to-automate-plotting-a-line-plot-of-all-data-points-and-overlay-dose-respons] and suggest how to proceed further. – RanonKahn Sep 01 '21 at 15:37

Wide format dataframe to long format dataframe using R

1 Answers1