Afraid I can't paste a code example as my dataset is sensitive.
After some issues with our source files we realised that our source file is inconsistent with allele coding and need to alter it, the first step in that is dropping the redundant column value (sometimes it's REF
, sometimes ALT1
), the third value, A1
is always used, all three are characters, and POSITION
is a string.
Given the number of rows involved I've tried to set up a loop as follows:
- Go to next row
- Concatenate new identifier using
A1
and whichever ofREF
andALT1
does not equalA1
Looks simple enought in theory, but just won't behave; on inspection it appears to correctly catch the first instance of the first line but not the others.
Is there a glaring mistake I've made somewhere? Thanks.
# NOTE: reversed in order to match mapping file formatting (equiv. to REF_ALT)
for (i in 1:nrow(Chr1_results.dt)){
if(Chr1_results.dt[i,]$A1 != Chr1_results.dt[i,]$ALT1){
Chr1_results.dt[i,]$POSITION <- paste(Chr1_results.dt[i,]$ID, Chr1_results.dt[i,]$A1, Chr1_results.dt[i,]$ALT1, sep = "_")
} else{
Chr1_results.dt[i,]$POSITION <- paste(Chr1_results.dt[i,]$ID, Chr1_results.dt[i,]$A1, Chr1_results.dt[i,]$REF, sep = "_")
}
}