4

I get a strange error when knitting this R Markdown into an HTML file. I think it has to do with some sort of incompatibility in the dplyr package with knitr.

UPDATE: I replaced the cbind chunk with the dplyr::bind_cols command, as someone suggested below not to use cbind with dplyr. However, I now get a different, equally incomprehensible error:

library(dplyr) counts.all <- bind_cols(count.tables[["SF10281"]], count.tables[["SF10282"]])

The error I get with this change (again, only when knitting):

Error in eval(expr, envir, enclos) : not compatible with STRSXP Calls: <Anonymous> ... withVisible -> eval -> eval -> bind_cols -> cbind_all -> .Call


Previous error with cbind instead of dplyr::bind_cols:

Running the chunks separately works fine, and I was able to knit fine until I added the last chunk (using select from dplyr).

This is the error I get:

Quitting from lines 75-77 (Analysis_SF10281_SF10282_Sep29_2015.Rmd) 
Error in UseMethod("select_") : 
  no applicable method for 'select_' applied to an object of class "NULL"
Calls: <Anonymous> ... withVisible -> eval -> eval -> <Anonymous> -> select_

This is the entire Rmd file:

Read-in gene count tables into a single list of data frames (one data frame per sample):

```{r}
count.files <- list.files(pattern = "^SF[0-9]+_counts.txt$")

count.tables <- lapply(count.files, read.table, header=T, row.names=1)

names(count.tables) <- gsub("\\_counts.txt", "", count.files)
```

Remove gene metadata columns:

```{r}
count.tables <- lapply(count.tables, `[`, -(1:5))
```

Rename cells (columns) to short version:

```{r}
count.tables <- lapply(count.tables, function(x) {names(x) <- gsub("X.diazlab.aaron.tophat_out.SF[0-9]+.Sample_(SF[0-9]+).[0-9]+.([A-Z][0-9]+).accepted_hits.bam", "\\1-\\2", names(x)); x})
```

Save object to file for later: {r} saveRDS(count.tables, file="gliomaRawCounts_10281_10282_10345_10360.rds")

Make a single data frame with all 4 samples (384 cells), and write to text file:

```{r}
counts.all <- cbind(count.tables[["SF10281"]], count.tables[["SF10282"]], count.tables[["SF10345"]], count.tables[["SF10360"]])

write.table(counts.all, file="gliomaRawCounts_10281_10282_10345_10360.txt", sep="\t", quote=F, col.names=NA)
```

Read metadata. Do not assign cell ID column as row.names, for compatibility with dplyr.

```{r}
meta <- read.delim("QC_metrics_SCell_SF10281_SF10282_SF10345_SF10360.txt", check.names = F, stringsAsFactors = F)
```

Filter cells based on live/dead/multi calls. Exclude empty, red-only, and multi-cell wells:

```{r, results='hide', message=FALSE, warning=FALSE}
library(dplyr)
meta.select <- filter(meta, grepl("^1g", `Live-dead_call`))
```

Filter cells based on 1,000 gene threshold:

(Includes 12 'FAIL' cells)
```{r}
meta.select <- filter(meta.select, Genes_tagged > 1000)
```

Subset counts table to include only cells that passed QC.

```{r}
counts.select <- dplyr::select(counts.all, one_of(meta.select$ID))
head(counts.select[,1:10])
```
M--
  • 25,431
  • 8
  • 61
  • 93
Carmen Sandoval
  • 2,266
  • 5
  • 30
  • 46
  • 1
    It's difficult to tell without some data that reproduces the problem, but per the error message it's likely `counts.all` is `NULL` (although it's likely `cbind` is going to cause you other problems with `select`). To troubleshoot, return something like `head(counts.all)` after you create that object and comment out the `dplyr::select` line while you figure out what is going on. – aosmith Oct 05 '15 at 21:11
  • Thanks for your response. You are right -- `head(counts.all)` returns `NULL` in the `knit` HTML file. However, when I do `head(counts.all)` in the console after running the `cbind` command, it returns exactly what it should. Why is this? Should I replace `cbind` with another command, perhaps from `dplyr`? – Carmen Sandoval Oct 06 '15 at 00:54
  • I replaced the `cbind` command with `dplyr::bind_cols` (please see update). Thanks for your help! – Carmen Sandoval Oct 06 '15 at 01:19
  • Edit: I replaced the command, but still get an error when knitting. – Carmen Sandoval Oct 06 '15 at 01:34
  • Well, if `counts.all` doesn't exist then it's possible `count.table` didn't come in correctly. Consider troubleshooting one chunk at time - i.e., check what `count.table` looks like after each chunk to see if you can figure out why `counts.all` isn't being made. Make sure you are testing outside your .rmd in a *clean* R session with no objects already loaded in it. – aosmith Oct 06 '15 at 01:48
  • It's really interesting -- I cleared up everything, started a new R sessions, no loaded objects, and when I run chunk by chunk, everything works fine and looks as it should. However, when knitting the .Rmd, the tables are `NULL`.... – Carmen Sandoval Oct 06 '15 at 02:40
  • Is the .rmd file in the same folder as the datasets? I bet they aren't being read in, which is another reason to check the results of each chunk after knitting. – aosmith Oct 06 '15 at 03:04
  • 2
    @marc_aragones: this is _Rmarkdown_, not Markdown. Please don't change it. – Hong Ooi Nov 24 '16 at 11:53

2 Answers2

0

I just realized how old this post it, however, I'll still add in my thoughts:

It's a little hard to follow along since we don't have access to the data. Can you try to use this as a reproducible example? I added print() to the end of each statement so you can quickly view the results.

I think some of the data isn't coming through as expected, check the results based on the data structure assumptions I made here to see where our two outputs diverge

library(tidyverse)

count.tables <- 
  list(
    SF10281 = head(mpg) |> select(SF10281 = cty),
    SF10282 = head(mpg) |> select(SF10282 = cty),
    SF10345 = head(mpg) |> select(SF10345 = cty),
    SF10360 = head(mpg) |> select(SF10360 = cty)
  ) |> 
  print()

counts.all <- # can also just use this purrr::map_dfc(count.tables, bind_cols)
  bind_cols(
    count.tables$SF10281,
    count.tables$SF10282,
    count.tables$SF10345,
    count.tables$SF10360
  ) |> 
  print()

meta <- 
  tibble(
    ID = names(counts.all),
    `Live-dead_call` = rep("1g", 4),
    Genes_tagged = c(400, 800, 1200, 1600)
  ) |> 
  janitor::clean_names() |>  # so everything is snake_case and lower case
  print()

meta.select <- 
  meta |>
  filter(
    str_detect(live_dead_call, "^1g"),
    genes_tagged > 1000
  ) |> 
  print()
  

counts.select <- 
  counts.all |> 
  select(one_of(meta.select$id)) |>
  print()
yake84
  • 3,004
  • 2
  • 19
  • 35
0

This is a very common error we see often when working with large, complex datasets so the question deserves a generalizable solution.

In general, error messages like this:

Error in UseMethod("select_") : no applicable method for 'select_' applied to an object of class "NULL"

tell you that one of the selection criteria is returning an empty set.

A useful trouble-shooting method is to test each of the select commands to examine what it is returning. For this example, create temporary debug variables:

tempvar1 <- count.tables[["SF10281"]]
tempvar2 <- count.tables[["SF10282"]]

I suspect both of these will be empty.

In general, these problems can have either of two sources:

  1. The source data files have some corruption or whitespace - use tidyverse read_table() function which does some additional cleaning.

  2. "SF10281" and "SF10282" may not be recognized names at the top level of the data structure - this can be checked by running names(count.tables) to view the list of names. If missing, the source files may be differently named or located in a different directory.

In fact, knitter assumes a different working directory - so doublecheck that knitter is reading from the subdirectory you intend. See: Setting Working Directory using knitr::opts_knit

I hope these tips help others who run into a similar error message.

GGAnderson
  • 1,993
  • 1
  • 14
  • 25