0

Let's suppose we have the dataframe below:

df <- read.table(header=T, text=
'Patient_ID    Gene         Type
1           ATM             3
1           MEN1            1
2           BRCA1           3
2           RAD51C          2
2           BRCA2           2
3           CHEK2           1
4           MUTYH           1
4           BRCA2           3', stringsAsFactors=F)

How can I rearrange this dataframe to make it look like the following :

ID  ATM MEN1 BRCA1  RAD51C  CHEK2   MUTYH   BRCA2
1    3    1                 
2             3       2                       2
3                             1     
4                                      1      3

Please note that each row is now a unique case and the column Type was used to provide the values for the new created columns.

Allan
  • 321
  • 1
  • 8

1 Answers1

1

Your data is long/tidy. You want it to be wide. There are many functions to do this in R. A commonly used one is tidyr::pivot_wider(), which I demonstrate below:

library(tidyverse)


df <- read.table(header=T, text=
                   'Patient_ID    Gene         Type
1           ATM             3
1           MEN1            1
2           BRCA1           3
2           RAD51C          2
2           BRCA2           2
3           CHEK2           1
4           MUTYH           1
4           BRCA2           3', stringsAsFactors=F)

# Blank cells will be NA
df |> 
  rename(ID = Patient_ID) |> 
  pivot_wider(names_from = Gene,
              values_from = Type) 
#> # A tibble: 4 × 8
#>      ID   ATM  MEN1 BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#>   <int> <int> <int> <int>  <int> <int> <int> <int>
#> 1     1     3     1    NA     NA    NA    NA    NA
#> 2     2    NA    NA     3      2     2    NA    NA
#> 3     3    NA    NA    NA     NA    NA     1    NA
#> 4     4    NA    NA    NA     NA     3    NA     1

# Blank cells as empty strings ("")
df |> 
  rename(ID = Patient_ID) |> 
  pivot_wider(names_from = Gene, 
              values_from = Type, 
              values_fn = as.character, 
              values_fill = "")
#> # A tibble: 4 × 8
#>      ID ATM   MEN1  BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#>   <int> <chr> <chr> <chr> <chr>  <chr> <chr> <chr>
#> 1     1 "3"   "1"   ""    ""     ""    ""    ""   
#> 2     2 ""    ""    "3"   "2"    "2"   ""    ""   
#> 3     3 ""    ""    ""    ""     ""    "1"   ""   
#> 4     4 ""    ""    ""    ""     "3"   ""    "1"

Created on 2022-05-23 by the reprex package (v2.0.1)

EDIT: Second solution simplified in line with @DarrenTsai's comment

Matt Cowgill
  • 659
  • 4
  • 13
  • Thanks @DarrenTsai, good suggestion, will edit – Matt Cowgill May 23 '22 at 03:34
  • Thank you both. I had to replace `|>` by `%>%` to make it work. – Allan May 23 '22 at 03:42
  • You must be on an older version of R @Allan, |> was added in R 4.1 – Matt Cowgill May 23 '22 at 03:51
  • Studying more about my question, I found a solution using reshape: ` reshape(df, idvar = "Patient_ID", timevar = "Gene", times = "Type", direction = "wide") ` . @matt, Is there any advantage relative to one another ? (performance, etc...) – Allan May 23 '22 at 13:25