0

Afraid I can't paste a code example as my dataset is sensitive.

After some issues with our source files we realised that our source file is inconsistent with allele coding and need to alter it, the first step in that is dropping the redundant column value (sometimes it's REF, sometimes ALT1), the third value, A1 is always used, all three are characters, and POSITION is a string.

Given the number of rows involved I've tried to set up a loop as follows:

  • Go to next row
  • Concatenate new identifier using A1 and whichever of REF and ALT1 does not equal A1

Looks simple enought in theory, but just won't behave; on inspection it appears to correctly catch the first instance of the first line but not the others.

Is there a glaring mistake I've made somewhere? Thanks.

    # NOTE: reversed in order to match mapping file formatting (equiv. to REF_ALT)
for (i in 1:nrow(Chr1_results.dt)){

if(Chr1_results.dt[i,]$A1 != Chr1_results.dt[i,]$ALT1){
  Chr1_results.dt[i,]$POSITION <- paste(Chr1_results.dt[i,]$ID, Chr1_results.dt[i,]$A1, Chr1_results.dt[i,]$ALT1, sep = "_")
} else{
  Chr1_results.dt[i,]$POSITION <- paste(Chr1_results.dt[i,]$ID, Chr1_results.dt[i,]$A1, Chr1_results.dt[i,]$REF, sep = "_")
}
}
Willow
  • 25
  • 6
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. We don't need to see your actual sensitive data; you can include fake data as long as it clearly demonstrates what the problem is. – MrFlick May 05 '20 at 19:23
  • If you cannot disclose your data then please shae some dummy data that gives us some idea of it, e.g. like Chr1_results.dt <- data.frame(A1 = 1:5, ALT1 = c(11,2,3,4,15), REF = c(1,22,23,4,5), ID = letters[1:5], POSITION = character(5), stringsAsFactors = FALSE) – Jan May 06 '20 at 17:59

0 Answers0