Referring to row names as numbers in analysis (geiger package)

Question

I'm trying to carry out tip.disparity function in the geiger package in R.

My data:

Family    Length   Wing    Tail  
Alced    2.21416 1.88129 1.66744 
Brachypt 2.36734 2.02373 2.03335 
Bucco    2.23563 1.91364 1.80675

When I use the function "name.check" to check the names from my data match those on my tree, it returns

$data.not.tree
[1] "1" "10" "11" "12" "2" etc

Showing that it is referring to the names by number. Ive tried converting to character vector etc

I've tried running it with

data.names=NULL

I'm looking simply to edit my data frame so that the package matches the names to those in my tree (tree is newick format)

Hope this is clearer Thanks

Care to show a snippet of the data, say via `head(foo)` where `foo` is your data frame? — Gavin Simpson, Aug 24 '11 at 17:12
We need a bit more context please: see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ... also, it's probably not obvious to the others on this list that you are working with phylogenetic data (although you do get points for mentioning the `geiger` package), and your 'data' may be in a phylogenetic tree object (`phylo`, documented in the `?ape::read.tree`). — Ben Bolker, Aug 24 '11 at 17:29
Nick: please use `dput` and embed the answer in an edited version of your question above, not in a comment. (If your tree is really big then you may need to show us a subset) — Ben Bolker, Aug 24 '11 at 17:29
Welcome to SO, Nick! Gavin suggested using `head(foo)` for a reason. It's not the values themselves we need to see, but the general structure of your data frame. Try again using `head(foo)`, or `dput(head(foo))` and put the results in your question, rather than a comment. — joran, Aug 24 '11 at 17:31

Ben Bolker · Accepted Answer · 2011-08-24T17:45:50.867

I believe the clue is in the documentation (?check.names):

data.names: names of the tips in the order of the data; if this is not
          given, names will be taken from the names or rownames of the
          object data

If you want the program to return the names of the taxa that are included in the data frame but not present in the tree, you either need to assign the corresponding names as row names of your data frame, or specify them separately in the data.names argument. Note that the default row names of a data frame are the character equivalent of the row number, exactly what you're seeing above ...

edit based on additional information above:

R can't guess (or doesn't want to) that the names are contained in the Family element of your data frame. Try:

check.names(traitdata,tree,data.names=as.character(traitdata$Family))

Probably better in the long run to do:

rownames(traitdata) <- as.character(traitdata$Family)
traitdata <- subset(traitdata,-Family)
check.names(traitdata,tree)

Because you don't want to have Family included in your data set of traits -- it's an identifier, not a trait ...

If you look at the structure of the example data given in the package:

data(geospiza)
geospiza.data

you can see that the taxon names are included as row names, not as a column in the data frame itself ...

PS it's not as nice an interface as StackOverflow, but there's a very friendly and active R-for-phylogeny mailing list at r-sig-phylo@r-projects.org ...

Referring to row names as numbers in analysis (geiger package)

1 Answers1