I have a phylip formatted text file of 300+ aligned COI sequences. I am trying to condense sequences into haplotypes for analysis using an R script written by a friend. The part I am having trouble with is where the program compares each sequence to the following sequence and determines if they differ by more than just N characters. It will run through the first few sequences before throwing the following error:
`Error in if (dif.nuc1[p] == "N" | dif.nuc2[p] == "N") { : missing value where TRUE/FALSE needed`
There are no gaps in the alignment so there shouldn't be any need to manage N/A data.
Any idea what the issue is and/or how to fix it? Alternatively, any recommendations for programs to consolidate haplotypes would also be appreciated.
Thank you in advance.
Below is the console output:
> if (sum(dif.nuc1=='N')==0 & sum(dif.nuc2=='N')==0){
+ } else if (length(dif.nuc1)!=0){
+ counter<- 0
+ for (p in 1:length(dif.nuc1)){
+ cat('p is', p, '\n')
+ if (dif.nuc1[p]== 'N'| dif.nuc2[p]== 'N'){
+ counter<- (counter + 1)
+ }
+ }
+ if (counter == length(dif.nuc1)){
+ hap.equiv<- c(hap.equiv, paste('Hap_', m, ' == Hap_', n, ' ', sep=''))
+ }
+ }
Error in if (dif.nuc1[p] == "N" | dif.nuc2[p] == "N") { :
missing value where TRUE/FALSE needed
I have tried including the modifying the code in the following ways to manage N/A data, but did not solve the issue.
if (dif.nuc1[p] == 'N' | dif.nuc2[p] == 'N' | is.na(dif.nuc1[p]) | is.na(dif.nuc2[p])) {
if (sum(dif.nuc1 %in% c('N', 'NA')) == 0 & sum(dif.nuc2 %in% c('N', 'NA')) == 0) {
} else if (length(dif.nuc1)!=0){
counter<- 0
for (p in 1:length(dif.nuc1)){
cat('p is', p, '\n')
if (dif.nuc1[p]== 'N'| dif.nuc2[p]== 'N'){
counter<- (counter + 1)
I have also double checked my data to ensure no ambiguity codes and no gaps that woudld cause NA data