Is there an R function to get my database in this form ? Tried gather and it doesn't solve my problem

Question

I have the dataframe called Days which has 3 variables A,B,C, and they can take values from 1 to 5, but two or more variables can't take the same value in the same day. This df looks like this:

Days<-structure(list(Day = c(1.01, 1.02, 1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09, 1.1), 
                          A = c(5L, 3L, 2L, 1L, 1L, 1L, 4L, 3L, 1L, 5L), 
                          B = c(3L, 5L, 1L, 2L, 2L, 4L, 3L, 1L, 5L, 4L), 
                          C = c(4L,4L, 5L, 5L, 3L, 5L, 5L, 5L, 2L, 1L)), 
                          class = "data.frame", row.names = c(NA,-10L))
> Days
    Day A B C
1  1.01 5 3 4
2  1.02 3 5 4
3  1.03 2 1 5
4  1.04 1 2 5
5  1.05 1 2 3
6  1.06 1 4 5
7  1.07 4 3 5
8  1.08 3 1 5
9  1.09 1 5 2
10 1.10 5 4 1

What I need is to get Days to look like this :

Day     1   2   3   4   5
1.01            B   C   A
1.02            A   C   B
1.03    B   A           C
1.04    A   B           C
1.05    A   B   C       
1.06    A           B   C
1.07            B   A   C
1.08    B       A       C
1.09    A   C           B
1.1     C       B   A

If instead of blank spaces it would appear 0, it wouldn't be a problem.

I have tried using gather, with and without the as.factor for the A,B and C, but all I obtained was a form like this:

> DaysG<-gather(Days,my_key,my_value,-Day)
> DaysG
    Day my_key my_value
1  1.01      A        5
2  1.02      A        3
3  1.03      A        2
4  1.04      A        1
5  1.05      A        1
6  1.06      A        1
7  1.07      A        4
8  1.08      A        3
9  1.09      A        1
10 1.10      A        5
11 1.01      B        3
12 1.02      B        5
13 1.03      B        1
14 1.04      B        2
15 1.05      B        2
16 1.06      B        4
17 1.07      B        3
18 1.08      B        1
19 1.09      B        5
20 1.10      B        4
21 1.01      C        4
22 1.02      C        4
23 1.03      C        5
24 1.04      C        5
25 1.05      C        3
26 1.06      C        5
27 1.07      C        5
28 1.08      C        5
29 1.09      C        2
30 1.10      C        1

Please help!!!

Thank you!

Use spread right after using gather. – Daniel R Jul 12 '20 at 18:50 — Daniel R, Jul 12 '20 at 18:50

akrun · Answer 1 · 2020-07-12T20:51:09.440

We can reshape to 'long' format and then to 'wide' with pivot_longer', 'pivot_wider' from tidyr`

library(dplyr)
library(tidyr)
Days %>% 
   pivot_longer(cols = A:C) %>% 
   pivot_wider(names_from = value, values_from = name)  %>% 
   select(Day, as.character(1:5))
# A tibble: 10 x 6
#     Day `1`   `2`   `3`   `4`   `5`  
#   <dbl> <chr> <chr> <chr> <chr> <chr>
# 1  1.01 <NA>  <NA>  B     C     A    
# 2  1.02 <NA>  <NA>  A     C     B    
# 3  1.03 B     A     <NA>  <NA>  C    
# 4  1.04 A     B     <NA>  <NA>  C    
# 5  1.05 A     B     C     <NA>  <NA> 
# 6  1.06 A     <NA>  <NA>  B     C    
# 7  1.07 <NA>  <NA>  B     A     C    
# 8  1.08 B     <NA>  A     <NA>  C    
# 9  1.09 A     C     <NA>  <NA>  B    
#10  1.1  C     <NA>  <NA>  B     A

If we need to automatically reorder, then convert to factor with levels specified in that order

Days %>%
   pivot_longer(cols = A:C) %>%  
   mutate(value = factor(value, levels = sort(unique(value)))) %>%
   pivot_wider(names_from = value, values_from = name, names_sort = TRUE)  
# A tibble: 10 x 6
#     Day `1`   `2`   `3`   `4`   `5`  
#   <dbl> <chr> <chr> <chr> <chr> <chr>
# 1  1.01 <NA>  <NA>  B     C     A    
# 2  1.02 <NA>  <NA>  A     C     B    
# 3  1.03 B     A     <NA>  <NA>  C    
# 4  1.04 A     B     <NA>  <NA>  C    
# 5  1.05 A     B     C     <NA>  <NA> 
# 6  1.06 A     <NA>  <NA>  B     C    
# 7  1.07 <NA>  <NA>  B     A     C    
# 8  1.08 B     <NA>  A     <NA>  C    
# 9  1.09 A     C     <NA>  <NA>  B    
#10  1.1  C     <NA>  <NA>  B     A

Or an one-liner with recast

library(reshape2)
recast(Days, id.var = 'Day', Day ~ value, value.var = 'variable')
#    Day    1    2    3    4    5
#1  1.01 <NA> <NA>    B    C    A
#2  1.02 <NA> <NA>    A    C    B
#3  1.03    B    A <NA> <NA>    C
#4  1.04    A    B <NA> <NA>    C
#5  1.05    A    B    C <NA> <NA>
#6  1.06    A <NA> <NA>    B    C
#7  1.07 <NA> <NA>    B    A    C
#8  1.08    B <NA>    A <NA>    C
#9  1.09    A    C <NA> <NA>    B
#10 1.10    C <NA> <NA>    B    A

Is there a way to order ascending the columns? I am talking about 50+ numerical columns that are really randomly assigned by the method that you showed me. Thank you! — Val Noir, Jul 12 '20 at 20:47

score 0 · Answer 2 · answered Jul 12 '20 at 19:01

Try this:

library(reshape2)
library(tidyr)
#Melt
DMelt <- melt(Days,id.vars = 'Day')
DMelt$value <- factor(DMelt$value,levels = sort(unique(DMelt$value)),ordered = T)
#Reshape
df <- reshape(DMelt,idvar = 'Day',timevar = 'value',direction = 'wide')

    Day variable.5 variable.3 variable.2 variable.1 variable.4
1  1.01          A          B       <NA>       <NA>          C
2  1.02          B          A       <NA>       <NA>          C
3  1.03          C       <NA>          A          B       <NA>
4  1.04          C       <NA>          B          A       <NA>
5  1.05       <NA>          C          B          A       <NA>
6  1.06          C       <NA>       <NA>          A          B
7  1.07          C          B       <NA>       <NA>          A
8  1.08          C          A       <NA>          B       <NA>
9  1.09          B       <NA>          C          A       <NA>
10 1.10          A       <NA>       <NA>          C          B

Is there an R function to get my database in this form ? Tried gather and it doesn't solve my problem

2 Answers2