0

I have data that look like this:

Gene HBEC-KT-01 HBEC-KT-02 HBEC-KT-03 HBEC-KT-04 HBEC-KT-05 Primarycells-02 Primarycells-03 Primarycells-04 Primarycells-05
BPIFB1 15726000000 15294000000 15294000000 14741000000 22427000000 87308000000 2.00E+11 1.04E+11 1.51E+11
LCN2 18040000000 26444000000 28869000000 30337000000 10966000000 62388000000 54007000000 56797000000 38414000000
C3 2.52E+11 2.26E+11 1.80E+11 1.80E+11 1.78E+11 46480000000 1.16E+11 69398000000 78766000000
MUC5AC 15647000 8353200 12617000 12221000 29908000 40893000000 79830000000 28130000000 69147000000
MUC5B 965190000 693910000 779970000 716110000 1479700000 38979000000 90175000000 41764000000 50535000000
ANXA2 14705000000 18721000000 21592000000 18904000000 22657000000 28163000000 24282000000 21708000000 16528000000

I want to make a heatmap like the following using R. I am following a paper and they quoted "Heat maps were generated with the ‘pheatmap’ package76, where correlation clustering distance row was applied". Here is their heatmap.

Heatmap using pheatmap in R

I want the same like this and I am trying to make one using R by following tutorials but I am new to R language and know nothing about R.

Here is my code.

df <- read.delim("R.txt", header=T, row.names="Gene")
df_matrix <- data.matrix(df)
pheatmap(df_matrix, 
     main = "Heatmap of Extracellular Genes",
     color = colorRampPalette(rev(brewer.pal(n = 10, name = "RdYlBu")))(10),
     cluster_cols = FALSE,
     show_rownames = F,
     fontsize_col = 10,
     cellwidth = 40,
     )

This is what I get.

Heatmap

When I try using clustering, I got the error.

pheatmap(
mat = df_matrix,
  scale = "row",
  cluster_column = F,
  show_rownames = TRUE,
  drop_levels = TRUE,
  fontsize = 5,
  clustering_method = "complete",
  main = "Hierachical Cluster Analysis"
)

Error in hclust(d, method = method) : 
NA/NaN/Inf in foreign function call (arg 10)

Can someone help me with the code?

Huzaifa Arshad
  • 143
  • 3
  • 14
  • Why not just `heatmap(df_matrix[, -1])`? – jay.sf Sep 26 '21 at 07:36
  • Yes but the graph is too small, there are gene names on right that I don't want, also there is clustering on top that I also don't want. Also, I am not sure if the graph is with correlation clustering distance row like the paper – Huzaifa Arshad Sep 26 '21 at 07:40
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Data should be in the question itself rather than stored on a potentially unsafe external server. What help do you need with the code exactly? It seems you have a data program rather than a coding problem. Your values are likely very skewed and you probably have missing values. – MrFlick Sep 26 '21 at 08:05
  • I am sure that the authors of the paper will help you if you ask them nicely for the code. – jay.sf Sep 26 '21 at 08:08
  • I don't know how to use this website that's why I put a link to data. My data dont have missing values but the intensity values are much larger. I need help – Huzaifa Arshad Sep 26 '21 at 08:11
  • @MrFlick I edited my question. Please have a look at it. I just want the heatmap like the 1st heatmap but I am unable to made one. – Huzaifa Arshad Sep 26 '21 at 09:21

1 Answers1

1

You can normalize the data using scale to archive a more uniform coloring. Here, the mean expression is set to 0 for each sample. Genes lower expressed than average have a negative z score:

library(tidyverse)
library(pheatmap)

data <- tribble(
  ~Gene, ~`HBEC-KT-01`, ~`HBEC-KT-02`, ~`HBEC-KT-03`, ~`HBEC-KT-04`, ~`HBEC-KT-05`, ~`Primarycells-03`, ~`Primarycells-04`, ~`Primarycells-05`,
  "BPIFB1", 1.5726e+10, 1.5294e+10, 1.5294e+10, 1.4741e+10, 2.2427e+10, 2e+11, 1.04e+11, 1.51e+11,
  "LCN2", 1.804e+10, 2.6444e+10, 2.8869e+10, 3.0337e+10, 1.0966e+10, 5.4007e+10, 5.6797e+10, 3.8414e+10,
  "C3", 2.52e+11, 2.26e+11, 1.8e+11, 1.8e+11, 1.78e+11, 1.16e+11, 6.9398e+10, 7.8766e+10,
  "MUC5AC", 15647000, 8353200, 12617000, 12221000, 29908000, 7.983e+10, 2.813e+10, 6.9147e+10,
  "MUC5B", 965190000, 693910000, 779970000, 716110000, 1479700000, 9.0175e+10, 4.1764e+10, 5.0535e+10,
  "ANXA2", 1.4705e+10, 1.8721e+10, 2.1592e+10, 1.8904e+10, 2.2657e+10, 2.4282e+10, 2.1708e+10, 1.6528e+10
)
data %>%
  mutate(across(where(is.numeric), scale)) %>%
  column_to_rownames("Gene") %>%
  pheatmap(
    scale = "row",
    cluster_column = F,
    show_rownames = FALSE,
    show_colnames = TRUE,
    treeheight_col = 0,
    drop_levels = TRUE,
    fontsize = 5,
    clustering_method = "complete",
    main = "Hierachical Cluster Analysis (z-score)",
  )

Created on 2021-09-26 by the reprex package (v2.0.1)

danlooo
  • 10,067
  • 2
  • 8
  • 22