3

I am trying to run a wilcox.test() on two subsets of data from a data frame. They are not of equal length (48 vs. 260). I want to see if there is a difference between the dbh (diameter at breast height) of live oak trees and water oak trees.

Pine_stand <- read.csv("Pine_stand.csv")
live_oaks <- subset(Pine_stand,Species=="live oak",select=c("dbh"));live_oaks
water_oaks <- subset(Pine_stand,Species=="water oak",select=c("dbh"));water_oaks

wilcox.test(live_oaks~water_oaks,conf.int=T,correct=F)
Error in model.frame.default(formula = live_oaks ~ water_oaks) : 
  invalid type (list) for variable 'live_oaks'

that was my first attempt then I tried this

Pine_stand <- read.csv("Pine_stand.csv")
live_dbh <- subset(Pine_stand,Species=="live oak",select=c("dbh"));live_oaks
water_dbh <- subset(Pine_stand,Species=="water oak",select=c("dbh"));water_oaks
oaks<-c(live_dbh,water_dbh)
wilcox.test(dbh~Species,data=oaks)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 48, 260
>

and received that error. I have tried vectorizing the two groups and appending and tapply ... I know there is a simple answer I am overlooking, I just can't get it to work. All of the examples I am reading are comparing two vectors with the same length. I know I can do the Wilcoxon test by hand when there are different numbers, so there should be a way. Any advice is welcome.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
m9000
  • 55
  • 1
  • 7
  • please provide a reproducible example (see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5965451#5965451) espacially add some data we can use :) – AaronP Dec 10 '17 at 21:49

2 Answers2

4

Yes, you can run a wilcox.test for variables of different length. As stated in http://www.r-tutor.com/elementary-statistics/non-parametric-methods/mann-whitney-wilcoxon-test

“Using the Mann-Whitney-Wilcoxon Test, we can decide whether the population distributions are identical without assuming them to follow the normal distribution.”

Therefore it’s a non-parametric equivalent of the t-test that we can use, when the assumptions for the t-test are not met (for example distribution is not normal or variances in two samples are not equal).

The problem in your code is that with these two statements:

live_dbh <- subset(Pine_stand,Species=="live oak",select=c("dbh"))
water_dbh <- subset(Pine_stand,Species=="water oak",select=c("dbh"))

you are creating two vectors that contain only dph values, but you lose information about the labels (Species). Therefore you should write:

live_dbh <- subset(Pine_stand,Species=="live oak",select=c("dbh", “Species”))
water_dbh <- subset(Pine_stand,Species=="water oak",select=c("dbh", “Species”))

Secondly when you are trying two merge the two sets with this code:

oaks<-c(live_dbh,water_dbh)

instead of creating a data frame you create a list. Why is that happening? First, as we can read from documentation for c(), its name stands for “Combine Values into a Vector or List”. Probably you have already used it to merge two vectors into one. However in case of subset function it actually gives as a result one column data-frame and not a vector. Therefore our live_dbh and water_dbh sets are data frames (and now with the label they even have two columns).

In case of one column data-frame you can always use c() function with recursive parameter set to TRUE to merge them:

total<-c(one_column_df1, one_column_df2, recursive=TRUE)

However it’s usually safer to use rbind function (and it’s also the only function that will work in case we are merging data frames with more than one column). Rbind stands for row bind.

oaks<-rbind(live_dbh,water_dbh)

Now you should be able to run a wilcox.test:

wilcox.test(dbh~Species,data=oaks)
Sonia
  • 61
  • 1
  • 5
2

How about

wilcox.test(dbh~Species, data=Pine_stand, 
            subset=(Species %in% c("live oak", "water oak"))

? (If these are the only two species in your data set, you don't need the subset argument.)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453