-2

I have an object (variable rld) which looks a bit like a "data.frame" (see further down the post for details) in that it has columns that can be accessed using $ or [[]].

I have a vector groups containing names of some of its columns (3 in example below).

I generate strings based on combinations of elements in the columns as follows:

paste(rld[[groups[1]]], rld[[groups[2]]], rld[[groups[3]]], sep="-")

I would like to generalize this so that I don't need to know how many elements are in groups.

The following attempt fails:

> paste(rld[[groups]], collapse="-")
Error in normalizeDoubleBracketSubscript(i, x, exact = exact, error.if.nomatch = FALSE) : 
  attempt to extract more than one element

Here is how I would do in functional-style with a python dictionary:

map("-".join, zip(*map(rld.get, groups)))

Is there a similar column-getter operator in R ?


As suggested in the comments, here is the output of dput(rld): http://paste.ubuntu.com/23528168/ (I could not paste it directly, since it is huge.)

This was generated using the DESeq2 bioinformatics package, and more precisely, doing something similar to what is described page 28 of this document: https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf.

DESeq2 can be installed from bioconductor as follows:

source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")

Reproducible example

One of the solutions worked when running in interactive mode, but failed when the code was put in a library function, with the following error:

Error in do.call(function(...) paste(..., sep = "-"), colData(rld)[groups]) : 
  second argument must be a list

After some tests, it appears that the problem doesn't occur if the function is in the main calling script, as follows:

library(DESeq2)
library(test.package)

lib_names <- c(
    "WT_1",
    "mut_1",
    "WT_2",
    "mut_2",
    "WT_3",
    "mut_3"
)
file_names <- paste(
    lib_names,
    "txt",
    sep="."
)

wt <- "WT"
mut <- "mut"
genotypes <- rep(c(wt, mut), times=3)
replicates <- c(rep("1", times=2), rep("2", times=2), rep("3", times=2))

sample_table = data.frame(
    lib = lib_names,
    file_name = file_names,
    genotype = genotypes,
    replicate = replicates
)

dds_raw <- DESeqDataSetFromHTSeqCount(
    sampleTable = sample_table,
    directory = ".",
    design = ~ genotype
    )

# Remove genes with too few read counts
dds <- dds_raw[ rowSums(counts(dds_raw)) > 1, ]
dds$group <- factor(dds$genotype)
design(dds) <- ~ replicate + group
dds <- DESeq(dds)

test_do_paste <- function(dds) {
    require(DESeq2)
    groups <- head(colnames(colData(dds)), -2)
    rld <- rlog(dds, blind=F)
    stopifnot(all(groups %in% names(colData(rld))))
    combined_names <- do.call(
        function (...) paste(..., sep = "-"),
        colData(rld)[groups]
    )
    print(combined_names)
}

test_do_paste(dds)
# This fails (with the same function put in a package)
#test.package::test_do_paste(dds)

The error occurs when the function is packaged as in https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/

Data used in the example:

I posted this issue as a separate question: do.call error "second argument must be a list" with S4Vectors when the code is in a library

Although I have an answer to my initial question, I'm still interested in alternative solutions for the "column extraction using a vector of column names" issue.

Community
  • 1
  • 1
bli
  • 7,549
  • 7
  • 48
  • 94
  • 1
    You need to post a reproducible example, if you used `dput(yourObject)` then we wouldn't have to scratch our heads as to what *'I have an object (variable rld) which looks a bit like a "data.frame" in that it has columns that can be accessed using $ or [[]]'* really meant: named list? data.table? something else? – smci Nov 24 '16 at 16:27
  • Until I see reproducible data that clarifies what *'object... looks a bit like a "data.frame"'* means, I'm voting-to-close. It takes like 60 seconds to paste it. – smci Nov 24 '16 at 16:29
  • @smci I didn't know about `dput`. I'm using R since only a few months. I will try this. – bli Nov 24 '16 at 16:32
  • @bli ok, please use `dput()` and give a very quick skim of **[How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)** – smci Nov 24 '16 at 16:36
  • @smci In my defense, I initially thought that my problem could be solved using techniques that apply to data frames in general. I considered citing the vignette that I finally mentioned, but decided that it would likely be of no use. I thought that the error message "attempt to extract more than one element" might ring a bell for more experienced R programmers (reason for which I had included it in the title (which was later edited)). – bli Nov 24 '16 at 17:03
  • 1
    @smci I added a reproducible example – bli Nov 24 '16 at 18:01
  • 1
    Ok so your object is a `DESeq2:: DESeqTransform`. You could just use `dput(head(dds), 10)` to show the first few lines, the other 750 don't add much. I don't know DESeq2 but someone else here should. Didn't mean to be cranky but we can't do much without a reproducible example. Sorry I can't retract my vote-to-close. By the way almost all new data structures in R these days tend to look like a data frame :) Glad to see you found your solution. – smci Nov 25 '16 at 12:43
  • Apparently, votes to close can be retracted now: http://meta.stackoverflow.com/questions/303253/asking-the-same-question-again-but-with-different-words/303254#comment415189_303254 – bli Nov 26 '16 at 07:05
  • Done ................ – smci Nov 26 '16 at 14:07

1 Answers1

3

We may use either of the following:

do.call(function (...) paste(..., sep = "-"), rld[groups])
do.call(paste, c(rld[groups], sep = "-"))

We can consider a small, reproducible example:

rld <- mtcars[1:5, ]
groups <- names(mtcars)[c(1,3,5,6,8)]
do.call(paste, c(rld[groups], sep = "-"))
#[1] "21-160-3.9-2.62-0"     "21-160-3.9-2.875-0"    "22.8-108-3.85-2.32-1" 
#[4] "21.4-258-3.08-3.215-1" "18.7-360-3.15-3.44-0"

Note, it is your responsibility to ensure all(groups %in% names(rld)) is TRUE, otherwise you get "subscript out of bound" or "undefined column selected" error.


(I am copying your comment as a follow-up)

It seems the methods you propose don't work directly on my object. However, the package I'm using provides a colData function that makes something more similar to a data.frame:

> class(colData(rld))
[1] "DataFrame"
attr(,"package")
[1] "S4Vectors"

do.call(function (...) paste(..., sep = "-"), colData(rld)[groups]) works, but do.call(paste, c(colData(rld)[groups], sep = "-")) fails with an error message I fail to understand (as too often with R...):

> do.call(paste, c(colData(rld)[groups], sep = "-"))
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘mcols’ for signature ‘"character"’
bli
  • 7,549
  • 7
  • 48
  • 94
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • It seems the methods you propose don't work directly on my object. However, the package I'm using provides a `colData` function that makes something more similar to a data.frame: `class(colData(rld))` returns these lines: `[1] "DataFrame"`, `attr(,"package")` and `[1] "S4Vectors"`. `do.call(function (...) paste(..., sep = "-"), colData(rld)[groups])` works, but `do.call(paste, c(colData(rld)[groups], sep = "-"))` fails with an error message I fail to understand (as too often with R...) – bli Nov 24 '16 at 15:28
  • Strangely, things work when I load my R script in the interactive session and manually use the `do.call`, but when I run my script with command-line arguments it fails saying "Error in do.call(function(...) paste(..., sep = "-"), colData(rld)[groups]) (from load_WT_prg1_data.R#80) : second argument must be a list". – bli Nov 24 '16 at 16:02
  • Your answer is useful, but didn't completely solve the problem. If I leave it marked "solved", less people will have a look to it and I get less chances to get other useful answers. You are right that it would be better with a reproducible example, but my script and the libraries it loads are quite complicated, so it will take me quite some time before I manage to build an example that can be posted here. – bli Nov 24 '16 at 16:30