0

I am writing code that takes microsatellite data and outputs a summary of the data such as number of alleles, sample size, counts of missing data etc. I've gotten these but am having trouble getting the allele frequencies. I keep getting an error that says

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 1, 0"

Based off my code can anyone tell me what the issue is and how to fix it.

The data is an excel file that has 11 columns, but the first one is excluded as it is not data used in the calculation leaving me with 10 columns to work with. There are five loci in diploid format, so two columns per loci. I have attached an image of what the data looks like. I should say I am novice R user so if my code is rough or even nonsensical at times please forgive me. Gotta start somewhere.

Any input is appreciated.

geno_data<-read.csv("Armadillo_only.csv")

OUT<-NULL


allele_summary<-function(x){
        num_alle<-colSums(!is.na(x[,-1]))
        tot_alle<-sum(num_alle)
        samp_size<-length(x[,1])
        na<-length(which(is.na(x[,-1])))
        zeros<-length(which(x[,-1]==0))
        missing_data<-sum(na,zeros)
    

        only_alleles<-(x[,-1])
    col_num<-ncol(only_alleles)

This is all one function, but the above portion is what I got to work when run as a separate function

    loci<-(2*(unique(round((1:(col_num-2))/2)))+1)

    for (i in loci){
        a<-c(only_alleles[,i],only_alleles[,i+1])
        a2<-as.data.frame(table(a))
        missing<-a2[which(a2[,1]==0),]
        a3<-a2[-which(a2[,1]==0),]
        a4<-cbind(a3,a3[,2]/sum(a3[,2]))
        output<-cbind(i,a4)

    OUT <<- rbind(OUT,output)
}
}

allele_summary(geno_data)

Sample data

Werner Hertzog
  • 2,002
  • 3
  • 24
  • 36
Jbrown
  • 11
  • 3
  • 4
    It's difficult to say what's going on—we can't run your code, because we don't have a sample of data, only a picture of a table. [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. – camille Apr 10 '19 at 03:09
  • `output<-cbind(i,a4)` I suspect the problem with this specific error lies here, where you try to cbind dataframes of different numbers of rows it won't work. e.g. `Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 632, 507` . With one argument being length 0, I don't think it is writing as you might think and the `i` of that line isn't actually an object in the for loop? – Roasty247 Apr 10 '19 at 03:41
  • I agree with @Roasty247. I would pull your code out of the function and loop and debug step by step, paying special attention to each `cbind()` call. – Raoul Duke Apr 10 '19 at 17:08
  • @Roasty247 Thank you all for your input, I was making this way more complicated than it actually was. I ended up abandoning the loop and went another direction. – Jbrown Apr 12 '19 at 10:31

0 Answers0