I am writing code that takes microsatellite data and outputs a summary of the data such as number of alleles, sample size, counts of missing data etc. I've gotten these but am having trouble getting the allele frequencies. I keep getting an error that says
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 1, 0"
Based off my code can anyone tell me what the issue is and how to fix it.
The data is an excel file that has 11 columns, but the first one is excluded as it is not data used in the calculation leaving me with 10 columns to work with. There are five loci in diploid format, so two columns per loci. I have attached an image of what the data looks like. I should say I am novice R user so if my code is rough or even nonsensical at times please forgive me. Gotta start somewhere.
Any input is appreciated.
geno_data<-read.csv("Armadillo_only.csv")
OUT<-NULL
allele_summary<-function(x){
num_alle<-colSums(!is.na(x[,-1]))
tot_alle<-sum(num_alle)
samp_size<-length(x[,1])
na<-length(which(is.na(x[,-1])))
zeros<-length(which(x[,-1]==0))
missing_data<-sum(na,zeros)
only_alleles<-(x[,-1])
col_num<-ncol(only_alleles)
This is all one function, but the above portion is what I got to work when run as a separate function
loci<-(2*(unique(round((1:(col_num-2))/2)))+1)
for (i in loci){
a<-c(only_alleles[,i],only_alleles[,i+1])
a2<-as.data.frame(table(a))
missing<-a2[which(a2[,1]==0),]
a3<-a2[-which(a2[,1]==0),]
a4<-cbind(a3,a3[,2]/sum(a3[,2]))
output<-cbind(i,a4)
OUT <<- rbind(OUT,output)
}
}
allele_summary(geno_data)