How to create DESeqDataSetFromMatrix from 2 vectors of numbers?

Question

I have two datasets, each having the form:

Gene1Name, 234
Gene2Name, 445
Gene3Name, 23
...
GeneNName, 554

The gene names are identical for each of the 2 datasets. The numbers on the second column are the expression counts for the corresponding gene.

I want to perform a differential gene expression analysis on these datasets. For that, I am using a DESeq library.

To use the DESeq function one needs to create an object

dds <- DESeqDataSetFromMatrix(countData=data, colData=meta, design=~sampletype)

For my case, what needs to be passed as arguments into the DESeqDataSetFromMatrix function?

So those data sets, you've mentioned, are different treatments? — utubun, May 15 '19 at 17:30

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

I think, if you'll try to follow this simple example, it might, at least, help you to solve your real problem.

We have to start from dummy data set preparation (please read how to make a minimal reproducible example):

Make a `treatment` data set:

library(tidyverse)

set.seed(56154455)

treatment <- data.frame(
  geneName = LETTERS,
  cts      = sample(0:1000, 26)
)

head(treatment)

#   geneName cts
# 1        A 834
# 2        B 860
# 3        C 950
# 4        D 302
# 5        E 979
# 6        F 159

Make a `control` data set:

set.seed(56154455)

control   <- treatment[sample(1:26, 26), ]
control[, 1] <- treatment[, 1]

head(control)

#    geneName cts
# 3         A 950
# 23        B  41
# 15        C 889
# 20        D 629
# 14        E 398
# 4         F 302

Join both `treatment` and `control` by `geneName`

cts <- full_join(treatment, control, by = 'geneName') %>%
  rename('treatment' = cts.x, 'control' = cts.y) %>%
  column_to_rownames('geneName') %>%
  as.matrix

head(cts)

#   treatment control
# A       331     737
# B       914     676
# C       161     161
# D       592     769
# E       946      74
# F       813     314

Prepare your `coldata` table

Remember, this is just a dummy example, so your real coldata, might include any number of columns, which reflects the design of your experiment. However, the number of rows in your coldata, has to be equal to the number of columns in your experimental data (here it is cts). Please read the documentation for SummarizedExperiment class, where you can find detailed explanation. Another great resource is the Rafa's book

coldata <- matrix(c("DMSO", "1xPBS"), dimnames = list(colnames(cts), 'treatment'))

coldata

#        treatment
# treatment "DMSO"   
# control   "1xPBS"

Finally, create your `DESeqDataSet`:

dds <- DESeq2::DESeqDataSetFromMatrix(
  countData = cts, 
  colData   = coldata, 
  design    = ~treatment
  )

Where:

countData is your experimental data, prepared as above;
colData is your coldata matrix, with experimental metadata;
~treatment is the formula, describing the experimental model you test in your experiment. It could be anything like ~ treatment + sex * age etc.

☠

dds

# class: DESeqDataSet 
# dim: 26 2 
# metadata(1): version
# assays(1): counts
# rownames(26): A B ... Y Z
# rowData names(0):
# colnames(2): treatment control
# colData names(1): treatment

thc · Answer 2 · 2019-05-15T18:23:30.243

1

You just need to concatenate the two vectors and put it into a matrix.

Since you said your two datasets contain two column, I assume first is gene name, second is count. You also mentioned that the names are the same. So you can do this:

data <- cbind(x1[,2], x2[,2])
rownames(data) <- x1[,1]
colnames(data) <- c("sample1", "sample2")

meta <- data.frame(sampletype = c("A", "B"))

dds <- DESeqDataSetFromMatrix(countData=data, colData=meta, design=~sampletype)

edited May 15 '19 at 18:23

answered May 15 '19 at 17:53

thc

9,527
1
24
39

And what do you pass as `meta` as well as `sampletype`? – mercury0114 May 15 '19 at 18:03
See updated answer. You should check out the vignette for DESeq2, I think it's a very good tutorial. – thc May 15 '19 at 18:23

How to create DESeqDataSetFromMatrix from 2 vectors of numbers?

2 Answers2

Make a treatment data set:

Make a control data set:

Join both treatment and control by geneName

Prepare your coldata table

Finally, create your DESeqDataSet:

Make a `treatment` data set:

Make a `control` data set:

Join both `treatment` and `control` by `geneName`

Prepare your `coldata` table

Finally, create your `DESeqDataSet`: