1

Hi i have data that looks like this :

X             snp_id is_severe encoding_1 encoding_2 encoding_0   chisq   p.value  AF_total  AF_LATIN
1 1 chr21-10139833-A-C         0          0          1          7 1.70625 0.191       0.4          0.3      
2 2 chr21-10139833-A-C         1          0          0         13 1.70625 0.191        0.4        0.3
3 3 chr21-10141374-T-C         0          0          1          7 1.70625 0.191        0.5         0.2
4 4 chr21-10141374-T-C         1          0          0         13 1.70625 0.191        0.5          0.2

and it continues to the right with

            snp_id         REF         ALT
chr21-10139833-A-C         A            C
chr21-10139833-A-C        A              C   
chr21-10141374-T-C         T              C
chr21-10141374-T-C         T             C

this data is very long and every snp_id has is_severe 0 and is_sever_1 ( in this example the p/chisq values are the same but in the whole data they are different what i wish to do is to reshape its structure and make it looks like this

 snp_id              is_severe_0_encoding_0 is_severe_0_encoding_1 is_severe_0_encoding_2    
chr21-10139833-A-C         7                          0                    1                        
chr21-10141374-T-C        7                           0                    1        
       

and the table continues to the right with this:

snp_id              is_severe_1_encoding_0 is_severe_1_encoding_1 is_severe_1_encoding_2    
chr21-10139833-A-C         13                         0                    0                        
chr21-10141374-T-C        13                           0                    0       
       

and the table continues to the right with this:

snp_id chisq p.value Af_total Af_latin REF ALT
chr21-10139833-A-C 1.70625 0.191 0.4 0.3 A C
chr21-10141374-T-C 1.70625 0.191 0.5 0.2 T C

i saw some answers in stack overflow on this topic but couldn't find that would fit my problem
for example : Converting data from wide to long format when id variables are encoded in column header How to reshape data from long to wide format

would appreciate any help

code for sample data :

snp_id <- c("chr21-10139833-A-C", "chr21-10139833-A-C","chr21-10141374-T-C","chr21-10141374-T-C")
is_severe <- c("0", "1","0","1")
encoding_1=c(0,0,0,0)
encoding_2=c(1,0,1,0)
encoding_0=c(7,13,7,13)
chisq=c(1.70625,1.70625,1.70625,1.70625)
pvalue=c(0.191,0.191,0.191,0.191)
REF=c("A","A","T","T")
ALT=c("C","C","C","C")
AF_TOTAL=c(0.4,0.4,0.5,0.5)
AF_latin=c(0.3,0.3,0.2,0.2)
df <- data.frame(snp_id,is_severe,encoding_1,encoding_2,encoding_0,chisq,pvalue,REF,ALT,AF_TOTAL,AF_latin)
  • Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – mhovd Dec 18 '22 at 12:53
  • @mhovd its not the same because I need to create "new columns" that include the combination of data of number of column example: is sever ==0 and encoding _0 becomes new column that contains data when the value of is severe is zero at the value from the encoding _0 column and its "new " name becomes is_sever_0_encoding_0 and for example chr21-10139833-A-C it has the value 7 – agnesa rivkin Dec 18 '22 at 12:57
  • 1
    Please post **reproducible data** – Chris Ruehlemann Dec 18 '22 at 13:02
  • The link you put can answer your question. see tidyr::spread() –  Dec 18 '22 at 13:03
  • @ChrisRuehlemann what do you mean by summarizing but im sort of creating "new columns " example: is_sever_0_encoding_0 column for the snp_id chr21-10139833-A-C it has the value 7 – agnesa rivkin Dec 18 '22 at 13:04
  • @CatededUr unfortunately i don't understand how to implement this in my case – agnesa rivkin Dec 18 '22 at 13:17
  • This seems like an XYZ-problem. – mhovd Dec 18 '22 at 13:36

1 Answers1

2

Is this what you need?

df %>%
   pivot_wider(names_from = is_severe,
               values_from = matches("encoding"),
               names_glue = "is_severe_{.name}") %>% 
   rename_with(~ str_replace_all(., "(is_severe_)(encoding_.)_(.)", "\\1\\3_\\2")) %>%
   select(snp_id, matches("is_severe"), everything())
# A tibble: 2 × 13
  snp_id             is_severe_0_encoding_1 is_severe_1_encoding_1 is_sev…¹ is_se…² is_se…³ is_se…⁴ chisq pvalue REF   ALT   AF_TO…⁵ AF_la…⁶
  <chr>                               <dbl>                  <dbl>    <dbl>   <dbl>   <dbl>   <dbl> <dbl>  <dbl> <chr> <chr>   <dbl>   <dbl>
1 chr21-10139833-A-C                      0                      0        1       0       7      13  1.71  0.191 A     C         0.4     0.3
2 chr21-10141374-T-C                      0                      0        1       0       7      13  1.71  0.191 T     C         0.5     0.2
# … with abbreviated variable names ¹​is_severe_0_encoding_2, ²​is_severe_1_encoding_2, ³​is_severe_0_encoding_0, ⁴​is_severe_1_encoding_0,
#   ⁵​AF_TOTAL, ⁶​AF_latin
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • thank you , but i need the columns to be in the name combination of example : is_sever=0 , encoding_0 will be the columns name is_severe_0_encoding_0 and the value will be the value of the encoding_0 column when the value of is severe column ==0 – agnesa rivkin Dec 18 '22 at 13:24
  • Check out my edited solution - better? – Chris Ruehlemann Dec 18 '22 at 13:36
  • is it possible to the is_severe_encoding column to come first ( before the chisq, pvalue) and to the names be like : is_severe_0_encoding_0 instead of is_seveere_encoding_0_0. and thank you so much for the help :) – agnesa rivkin Dec 18 '22 at 13:42
  • Now finally better? – Chris Ruehlemann Dec 18 '22 at 14:25