0

So - I have a data.frame that looks like this:

ID   SNPIndex   A1  A2
ID1  1   A    B
ID1  2   B    B
ID1  3   A    B
ID2  1   A    B
ID2  2   B    B
ID2  3   A    A
ID3  1   B    B
....

and I would like for it to look like this:

ID 1_A1 1_A2 2_A1 2_A2 3_A1 3_A2
ID1 A    B    B    B    A   B
ID2 A    B    B    B    A   A
ID3 ...

i.e. I would like one row for each ID and two columns for each SNPIndex - each column with one A1/A2 value.

I would really appreciate your help!

zzabaa
  • 86
  • 5
  • 1
    Welcome to SO! Could I get you to do a few things? Most importantly could you please make this into a reproducible example, using either data you create within the question, that you `dput`, or using a built in dataset? Could you please also review our formatting guidelines and check for prior answers? – Hack-R Jul 04 '16 at 13:50

2 Answers2

0

I'm sure that a) this is a duplicate and b) my code can be simplified but this appears to do what you are after

dat <- data.frame( ID = c("ID1" , "ID2" , "ID3") ,
                   SNPIndex = c(1,2,3) , 
                   A1 = c("A", "B" , "A") ,
                   A2 = c("B" , "B" , "B") , stringsAsFactors = F)

library(tidyr)
library(dplyr)


dat %>% 
    gather( KEY, VALUE , A1, A2) %>% 
    mutate( KEY = paste0(SNPIndex , "_", KEY)  ) %>% 
    select( -SNPIndex , - ID) %>% 
    spread( KEY , VALUE )
gowerc
  • 1,039
  • 9
  • 18
  • Thank you so much! And my apologies for duplicating the question! – zzabaa Jul 05 '16 at 09:08
  • Actually I've stated my question wrongly ... My dataset actually looks like the this (edited version). Sincerest apologies! – zzabaa Jul 05 '16 at 10:42
0

You can use the reshape package's dcast in a loop.

library(reshape2)
df <- data.frame(ID=c("ID1","ID2","ID3"),
                 SNPIndex=1:3,
                 A1=c("A","B","A"),
                 A2=c("B","B","B")
                 )

dummy <- rep(1,3)
number_of_As <- 2

for (i in 1:number_of_As) {
  rawdf <- dcast(df, dummy ~ SNPIndex, value.var=paste0("A",i))
  rawdf <- rawdf[,c(-1)]
  colnames(rawdf) <- paste0(1:3,paste0("_A",i))
  if (i == 1) {
    newdf <- rawdf
  } else {
    newdf <- cbind(newdf,rawdf)
  }
}

This will give you the result you want:

> newdf
  1_A1 2_A1 3_A1 1_A2 2_A2 3_A2
1    A    B    A    B    B    B

The trick is to use the dummy vector so that it collapses into a single row, which you can then bind into your desired data frame.

Anton
  • 1,458
  • 1
  • 14
  • 28
  • Actually I've made a mistake when presenting the dataset. I've corrected the post above, could you maybe help me modify this code to fit my actual dataset? Please, I am going crazy ;) – zzabaa Jul 05 '16 at 13:35