1

I'm analysing protein data sets. I'm trying to build a tree with the package phangorn in R. When I construct it, I get negative edge lengths that sometimes makes difficult to proceed with the analysis (modelTest). Depending on the size of the dataset (more than 250 proteins), I can't perform a modelTest. Apparently there is a problem due to negative edge lengths. However, for shorter datasets I can perform a modelTest even though there are some negative edge lengths. I am runing it directly from my terminal.

library(phangorn)
dat = read.phyDat(file, format="fasta", type="AA")
tax <- read.table("organism_names.txt", sep="\t", row.names=1)
names(dat) <- tax[,1]
distance <- dist.ml(dat, model="WAG")
tree <- bionj(distance)
mt <- modelTest(dat, tree, model=c("WAG", "LG", "cpREV", "mtArt", "MtZoa", "mtREV24"),multicore=TRUE)

    Error: NA/NaN/Inf in foreign function call (arg 1)
    In addition: Warning message:
    In pml(tree, data) : negative edges length changed to 0!

Does somebody have any idea of what can I do?

cheers, Alba

Marc in the box
  • 11,769
  • 4
  • 47
  • 97
aLbAc
  • 337
  • 3
  • 18
  • Welcome to SO - your example is not reproducible because your data sets are not available. See if you can provide a small data example that would allow others to better provide an example. See the following link for tips on creating reproducible examples in R: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Marc in the box Oct 25 '13 at 10:16
  • The answer here, as noted in a comment below, was that phangorn.ModelTest was giving an error for data sets larger than 220 taxa. The problem was not the negative branch lengths. – Argalatyr Mar 27 '14 at 21:58
  • I encountered the same error message and the problem was that my tree was not ultrametric. – user1981275 Apr 09 '15 at 10:45

1 Answers1

2

As @Marc said, your example isn't really reproducible...

If the problem really is negative or zero branch lengths, you could try to make them a really small positive number, for instance:

tree$edge.length[which(tree$edge.length <= 0)] <- 0.0000001

Another tip is to subscribe to R-sig-phylo, a mail list about phylogenies in R. People there are really knowledgeable an usually respond pretty fast.

dudu
  • 675
  • 6
  • 15
  • Thank you very much for answers dudu. I tried to make the example reproducible, but finally I could determine that the maximun set of data phangorn analyses is 220 sequences. So modelTest gave me an error when I tried with bigger data sets. – aLbAc Nov 27 '13 at 23:26