1

I have difficulties to apply the estimateD function in the iNEXT package to my own data. I am working on bees and I have a very large dataset of count of records in grid cells covering a particular region. I want to compute Hill diversities for each of my grid cells by rarefying by size (and also by coverage, both methods don't work on my own data but here I report the error I got with the base="size" argument)

In order to use the function, I have used a species x sites (=grid cells) matrix and transformed it into a list as in the reprex of the function:

library(iNEXT)
data(spider)
iNEXT::estimateD(spider, datatype="abundance", base="size", level=NULL, conf=NULL)

I have data for 656 bee species for 4428 grid cells. When running the function on the all data, I got the following error: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows

But when subsetting the list with much smaller number of grid cells the function may success. Here is a reprex. The reprex contains 67 different grid cells. I must apologize but it's the smallest subset for which I get an error.

List1=list(col1 = c(4, 2, 1, 1, 1, 2, 3, 2, 2, 2, 1), col2 = c(1, 3,3, 3, 3, 3, 3, 4, 3, 3, 3, 1, 3, 3, 3),
           col3 = c(3, 6, 2, 1,7, 7, 5), col4 = c(2, 4, 2, 3, 3, 2, 4, 5, 5, 4, 3, 5, 3),
           col5 = c(6,1, 3, 4, 2, 2, 2), col6 = c(4, 8, 1, 1, 4, 1, 8, 9, 5, 2, 9,7, 1, 11, 4, 1, 2),
           col7 = c(1, 1, 2, 1, 2, 3, 7, 8, 5, 6, 6, 4, 10, 1, 1, 1), col8 = c(2, 1, 3, 1, 1, 1, 1, 2, 2, 1, 2, 2,2, 2, 1, 2, 1),
           col9 = c(2, 4, 4, 3, 3, 3, 2, 2, 5), col10 = c(3,2, 2, 2, 5, 4, 4, 5, 1), 
           col11 = c(4, 2, 2, 4, 3, 2, 4, 4, 2, 4, 1), col12 = c(1, 1, 3, 1, 3, 2, 1, 5, 6, 2, 5),
           col13 = c(2,4, 2, 1, 1, 5, 1, 2, 4, 2, 3, 1, 2, 1, 1), 
           col14 = c(3, 2, 14,31, 8, 3, 1, 7, 5, 21, 6, 21, 43, 26, 2, 33, 16, 20, 7, 3, 18, 2, 1, 1), 
           col15 = c(2, 2, 10, 2, 3, 2, 5, 2, 9, 1, 8, 6, 7, 3, 7, 1, 2, 2, 5, 1, 1, 1, 1, 3, 1, 3),
           col16 = c(4, 1, 1, 1, 4,3, 1, 1, 3, 1, 1),
           col17 = c(4, 8, 1, 1, 1, 1, 1, 2, 1), col18 = c(3,2, 1, 2, 1, 1, 1, 1, 3, 3, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1),
           col19 = c(4,4, 4, 9, 2, 7, 6, 2, 9), col20 = c(3, 4, 1, 2, 5, 4, 1, 1, 2), 
           col21 = c(2, 2, 2, 1, 1, 3, 1, 2, 1, 1, 2, 2, 1, 1), col22 = c(2, 7, 1, 1, 2, 2, 5, 3, 3, 1, 1, 4, 2), 
           col23 = c(1, 5, 1, 1,3, 2, 1, 1, 1, 2, 2, 2, 3, 1, 2, 1, 2, 2, 1, 1), 
           col24 = c(7, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1), col25 = c(3, 3, 1, 3, 3,3, 3, 1, 2, 4, 3, 5), 
           col26 = c(11, 2, 1, 7, 5, 8, 11), col27 = c(3,4, 10, 1, 10, 3, 9), col28 = c(4, 1, 1, 4, 1, 2, 3, 1, 3), 
           col29 = c(3, 1, 1, 2, 3, 4, 2, 2, 4, 5, 10, 1, 6, 2, 6, 1,6, 8, 11, 1, 1, 1, 1), 
           col30 = c(5, 1, 2, 2, 3, 3, 3, 1,2, 2, 2, 1, 1, 1, 3, 1, 1), 
           col31 = c(2, 5, 4, 2, 1, 1, 1,1, 1, 1, 1, 2, 3, 1, 1, 1), 
           col32 = c(2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), 
           col33 = c(4, 1, 2,2, 5, 3, 2, 6, 7, 2, 3, 5), 
           col34 = c(1, 3, 1, 3, 3, 7, 1,1, 2, 2), col35 = c(1, 7, 6, 2, 7, 12, 2, 2, 3, 3, 2, 7), 
           col36 = c(6, 1, 3, 1, 14, 3, 2, 4, 1), col37 = c(5, 2, 1,1, 2, 1, 1, 2, 3, 1, 1, 3, 1), 
           col38 = c(1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), col39 = c(4, 2,2, 1, 2, 4, 2, 2, 2), 
           col40 = c(2, 3, 1, 3, 4, 4, 1, 3, 4,1), col41 = c(2, 1, 2, 2, 2, 4, 5, 6, 6, 13, 7, 10, 3, 8,1, 1, 1, 1, 1),
           col42 = c(4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 3), 
           col43 = c(3, 1, 2, 1, 3, 2, 3, 2, 3, 2), col44 = c(3,1, 1, 1, 5, 3, 1, 3, 2, 1), 
           col45 = c(3, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 2, 5, 1, 2, 2), col46 = c(1, 5, 4, 1, 1, 2, 1,2, 2, 1, 5, 3, 2, 4, 2, 1, 2, 2), 
           col47 = c(3, 3, 2, 1, 2, 1, 2, 4, 1, 2, 1), col48 = c(2, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 2, 2, 5, 1, 3, 6),
           col49 = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3, 1), 
           col50 = c(4, 1, 5, 1, 5, 4, 6, 9, 5, 10, 14, 2, 4, 6, 4), col51 = c(1,3, 2, 3, 5, 4, 2, 1, 3, 2), 
           col52 = c(1, 3, 2, 3, 3, 2, 3,2, 2, 2, 1, 2, 4), col53 = c(3, 1, 2, 3, 2, 4, 2, 3, 2, 2), 
           col54 = c(7, 6, 6, 7, 3, 1, 1, 3, 1, 1),
           col55 = c(4, 1, 3, 4, 2, 1, 2, 4, 1, 4, 4, 4, 1), col56 = c(2, 3, 3, 1,3, 4, 2, 2, 2, 4), 
           col57 = c(1, 1, 4, 6, 2, 7, 4, 3, 10,7, 3, 1, 9, 3), col58 = c(5, 5, 1, 1, 3, 3, 3, 4, 2, 2, 2), 
           col59 = c(8, 8, 2, 3, 2, 2, 2, 1, 3, 1, 2, 2, 2, 1), col60 = c(4,1, 2, 6, 3, 2, 1, 2, 3, 1, 2, 3, 2, 1), 
           col61 = c(6, 2, 2,2, 4, 2, 2, 5, 5, 1, 2, 6, 2, 1), 
           col62 = c(7, 5, 8, 3, 1,2, 2, 2, 2, 1, 3, 1, 1, 1, 3, 1, 3, 2, 3, 1, 1, 3, 1, 1), 
           col63 = c(2, 3, 3, 1, 3, 1, 4, 1, 4, 3, 2, 4), col64 = c(3,2, 3, 2, 2, 5, 2, 3, 6, 1, 6, 5, 2, 6, 1, 3), 
           col65 = c(1, 2, 2, 1, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1), 
           col66 = c(3, 2,1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 3, 5, 1, 1, 4, 1,1, 1, 3, 1), 
           col67 = c(4, 3, 2, 2, 1, 1, 1, 2, 2, 1, 2))

by_size <- iNEXT::estimateD(List1,
                            datatype = "abundance", base = "size",
                            level=NULL, conf=NULL)
#Error in data.frame(..., check.names = FALSE) : 
#arguments imply differing number of rows: 67, 66

The list that I provided doesn't have zeros so every grid cell doesn't have the same number of species but the same error appears with each grid cell having the same number of species (with zeros). I did that in order to have the smallest reprex as possible.

Now if we reduce the list by removing one grid cell (or more), the function works:

List2=List1[1:66]
by_size2 <- iNEXT::estimateD(List2,
                            datatype = "abundance", base = "size",
                            level=NULL, conf=NULL)

I am just trying to understand why it produces such error. If any of you has already faced this problem, please let me know. I would be delighted to have at least a suggestion on how to proceed or an explanation why it's not working.

Thank you very much in advance!

  • Although not an answer, Jens's comment below is helpful. First thing is that if you set `names(List1)<-NULL` you do not get this error. I'm looking at where the names become an issue. – Michael Roswell Jul 21 '22 at 20:49
  • I think this occurs with the line `tmp <- tmp[!duplicated(tmp), ]` in the function `iNEXT::estimateD()`. I am not yet sure what this line is designed to do. – Michael Roswell Jul 21 '22 at 20:58
  • Turns out this was documented in at least one place: https://github.com/JohnsonHsieh/iNEXT/issues/67#issue-1064311383 – Michael Roswell Jul 21 '22 at 21:00

2 Answers2

1

I don't have the answer, but your example also fails when you subtract another row than the last. So it seems it's not the list length itself that's the problem.

List3=List1[-66]
by_size3 <- iNEXT::estimateD(List3,
                            datatype = "abundance", base = "size",
                            level=NULL, conf=NULL)
#Error in data.frame(..., check.names = FALSE) : 
#  arguments imply differing number of rows: 66, 65

I came here because I stumbled upon a perhaps related issue in the same package. Non-named lists in the iNEXT function also produces a similar error.

test_data <- list(matrix(sample(c(0,1), 100, replace = T),
                         ncol = 10, 
                         dimnames = list(paste("species_", 1:10),
                                      NULL)),
                  matrix(sample(c(0,1), 100, replace = T),
                         ncol = 10, 
                         dimnames = list(paste("species_", 1:10),
                                      NULL))
                  )

iNEXT(test_data,
      datatype = "incidence_raw")
#Error in data.frame(site = rownames(out), out) : 
#  arguments imply differing number of rows: 0, 2

#While this works
names(test_data)
names(test_data) <- c("community_1", "community_2")

iNEXT(test_data,
      datatype = "incidence_raw")
Jens Åström
  • 21
  • 1
  • 1
  • 4
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/30536405) – jpsmith Dec 09 '21 at 02:05
  • jpsmith is definitely correct but also, this was very helpful. I'm going to upvote it b/c I think it makes sense to reward attempts to narrow the scope of issues and sharpen questions. – Michael Roswell Jul 22 '22 at 13:08
0

OK, I think there are two things going on in iNEXT::estimateD that contribute to this frustrating behavior. The first is the filtering of duplicates (which I don't understand the purpose of). The second is name matching, which is the the thing that @JensÅström mentioned.

In your dataset, the first and last rows have the same abundances: all.equal(sort(List1[[1]]), sort(List1[[67]]))

This means that when iNEXT::estimateD filters out duplicates with the line of code tmp <- tmp[!duplicated(tmp), ], the 67th element is removed. I don't yet see why this is desireable and I feel like it should have an associated warning... there is nothing inherently weird about having multiple samples with the same abundances. But actually, without the name matching code afterwards, this subsetting would happen silently.

# L3 is an MRE
L3<-List1[c(67,1)]

by_size <- iNEXT::estimateD(L3,
                            datatype = "abundance", base = "size",
                            level=20, conf=NULL)

# dropping names changes the behavior still. why?
L4<-L3
names(L4)<-NULL
by_size <- iNEXT::estimateD(L4,
                            datatype = "abundance", base = "size",
                            level=20, conf=NULL)



duplicated(L3)
duplicated(L4)
# ok, so it's not exactly with the behavior of `duplicated` that things go awry 
# (but weird that the 2nd duplicate is dropped...)

The error, per se occurs with iNEXT::estimateD's approach to returning the names from your input as a column. Last bit of the function:

nam <- names(x)
    if (is.null(nam)) {
        tmp
    }
    else if (ncol(tmp) == 6) {
        tmp <- cbind(site = nam, tmp)
    }
    else {
        tmp <- cbind(site = rep(nam, each = 3), tmp)
    }
    rownames(tmp) <- NULL
    tmp

When you cbind the names (nam) with the output (which has had that duplicate dropped for unknown reasons), you get that error; nam and tmp have different lengths.

Michael Roswell
  • 1,300
  • 12
  • 31
  • I guess for me the bottom line would be if I run into this issue, I'll use lower-level `iNEXT` functions in custom wrappers whose behavior I think I understand; it sounds like the `iNEXT` package authors have been slow to respond to this particular issue: https://github.com/JohnsonHsieh/iNEXT/issues/67 – Michael Roswell Jul 22 '22 at 13:06
  • I added a PR and it looks like someone else had as well. I also emailed Anne Chao. Hopefully this issue can be resolved. – Michael Roswell Jul 22 '22 at 13:51
  • In the meantime, perhaps installing the version from one of the PR forks will allow users to avoid this issue: https://github.com/JohnsonHsieh/iNEXT/pulls – Michael Roswell Jul 22 '22 at 13:53
  • Looks like the iNEXT developers have resolved this: https://github.com/JohnsonHsieh/iNEXT.git – Michael Roswell Jul 27 '22 at 14:45