0

I have created a heatmap for the top 100 differentially expressed transcripts with EnsembIDs and three samples (RNA-seq -> kallisto -> sleuth) .

library(gplots) # heatmap.2
library(dplyr) # unite

heatmap.2(log(tmp_df+1), trace="none", density.info="none", scale="row")

Now I am interested to add gene names in the heatmap for e.g. EnsemblD_genename. For this purpose, I have proceeded as follows:

1) Created "ext_gene" column in tmp_df

tmp_df["ext_gene"] <- NA

2) Matched target_id from tmp_df with target_id of top_100new (containing ext_gene)

tmp_df$ext_gene <- top_100new$ext_gene[match(tmp_df$target_id, top_100new$target_id)]

3) Merged target_id and ext_gene columns in tmp_df (target_id__ext_gene column in place of target_id and ext_gene columns)

unite <- unite(tmp_df, target_id__ext_gene, target_id, ext_gene, sep='_')

4) I was unable to convert "unite" to numeric matrix because one column "target_id__ext_gene" has NA in some rows for e.g. ENSEMBL00001_NA, so I tried to replace NA with NONE.

unite$target_id__ext_gene <- gsub ('NA', 'NONE', unite$target_id__ext_gene)

5) But, still I am unable to convert "unite" into numeric matrix because column "target_id__ext_gene" has character class. I have tried to use

unite$target_id__ext_gene <- as.numeric(as.character(unite$target_id__ext_gene))

but, it converts all rows of column "target_id__ext_gene" into NA (NAs introduced by coercion)

I know that numeric object can be passed to heatmap.2 and there only I am stuck because of one character column.

This is how my data looks like by using reproduce(unite):

                   target_id__ext_gene      T2   Npt3         n1   Npt1   Npt2        T3
    1)             ENS00000112_NONE 5239.1161 0.000000e+00 1.117028e+03 0.000000e+00 0.000000e+00 3905.476311
    2)             ENS00000150_tfb2m  771.3926 1.012137e+03 4.132779e-06 7.785302e+02 7.625490e+02  634.195429
...
    99)             ENS00000130_NONE  136.2607 1.658801e+00 1.498763e+02 2.733379e+00 0.000000e+00   64.313849
    100)            ENS00000124_NONE  606.0573 1.155628e+02 3.062783e+02 1.054907e+02 1.084090e+02  430.250175

               n3         n2         T1
1)   1.327292e+03 1.401719e+03 4230.5667240
2)   1.561575e-06 1.113367e-06  526.1571307
...
99)  1.511978e+02 1.240264e+02   68.4360589
100) 3.817887e+02 4.725010e+02  636.0279422
zx8754
  • 52,746
  • 12
  • 114
  • 209
bio8
  • 176
  • 2
  • 15

1 Answers1

2

The issue is likely that heatmap.2 requires a numeric matrix and is expecting the rownames to serve as labels. You can accomplish that from your data, however, without setting the rownames.

First, some actual reproducible data:

df <-
  data.frame(
    target_id__ext_gene = LETTERS
    , matrix(rnorm(26*6, 50, 5)
             , nrow = 26)
  )

Then, pass the numeric portions to heatmap.2 as a matrix (with out the labelling column), and pass in the labels explictly:

heatmap.2(
  as.matrix(df[, -1])
  , labRow = df$target_id__ext_gene
)

Produces:

enter image description here

Which you can further modify with the other settings you want.

Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • I understand that rnorm function is generating random numbers whose distribution is normal, but howcome mean is 50 and sd is 5. For my data, I don't know the values for mean and sd or I need to calculate. Or as per [link](cyclismo.org/tutorial/R/probability.html) mean and sd are optional arguments. So I used `df <- unite(target_id__ext_gene = LETTERS, matrix(rnorm(100*9), nrow = 100))` because I have 100 rows and 9 columns. I got this error: **Error in unite(target_id__ext_gene = LETTERS, matrix(rnorm(100 * 9), : unused argument (target_id__ext_gene = LETTERS)** – bio8 Dec 21 '16 at 09:57
  • I just played around and was able to do that without rnorm and with updated `heatmap.2`. Here it is `>heatmap.2(log(as.matrix(unite[, -1]) + 1), labRow = unite$target_id__ext_gene, trace="none", density.info="none", xlab = NULL, ylab = NULL, main = "Heatmap", col=redgreen(75), margins = c(5.5, 8), scale="row", cexCol = 0.8, cexRow=0.5)`. – bio8 Dec 21 '16 at 10:38
  • The construction of `df` was only to show an example, which was necessitated because you did not provide reproducible data. Thus, it will not (obviously) match your actual data. Also: yes, what you wrote in the comment is the extension of my answer to match your actual data. The options beyond `labRow` are, however, not part of the MCVE of your problem (hence why I omitted them in my answer). – Mark Peterson Dec 21 '16 at 13:46