Heatmap.2 cluster not working for similar datasets

Question

I'm using heatmap.2 to plot a bunch of heatmaps, and it has been working for most of my data but I get the: Error in hclustfun(distr) : NA/NaN/Inf in foreign function call (arg 10) for some of my datasets.

This question is similar to what I need.

Here's my code for the heatmap:

heatmap.2(data.matrix(scaled_df), scale="none",
              ,trace="none", Rowv=FALSE, dendrogram="column")

My datasets have NAs in it and I want to keep the NAs as they are for visualization in my heatmap. I'm not sure why the plots are working for some of my datasets but not others even though they look the same.

I've ensured the class of the dataframe columns are numeric for both and plotting without clustering works for the datasets giving the error.

The code works for this

structure(c(NA, NA, NA, -0.373055352100063, -0.385706401091696, 
-0.391309347218752, -0.37940898600181, NA, NA, NA, -1.30818865300157, 
-1.28100289342474, -1.27516499611363, -1.29618539092743, NA, 
NA, NA, -0.451429907792099, -0.458270686949654, -0.462138953607995, 
-0.455415387710277, NA, NA, NA, -0.176790480220641, -0.195752707175293, 
-0.203272400570155, -0.186355718130766, NA, NA, NA, 1.50659539820666, 
1.52712982958485, 1.51743516237196, 1.51133642013844, 2.04124145231932, 
2.04124145231932, 2.04124145231932, 0.802868994907711, 0.793602859056535, 
0.81445053513857, 0.806029062631844), .Dim = 7:6, .Dimnames = list(
    c("G1_1", "G1_2", "G1_3", "G2_1", 
        "G2_2", "G2_3", "G2_4"), c("R1", "R2", "R5", 
    "R6", "R4", "R3")))

But not this

structure(c(0.261464614279221, 0.255611337873998, 0.726613533871122, 
    0.728606613293338, NA, NA, NA, 0.53883348857398, 0.540410891091193, 
    0.318717521491049, 0.317760075309925, 0.658338893924264, 0.45488158789282, 
    0.454676504730871, 0.55105410441913, 0.552570638827687, 0.326763829307165, 
    0.326621150890079, 0.67112393563164, 0.45683001858747, 0.456604031661036, 
    0.0282153134878549, 0.0288220522147781, 0.3256335271748, 0.326144472474918, 
    0.532476638629399, 0.455536129433743, 0.455571483454085, 0.611222528034844, 
    0.612795125514566, 0.316575462086262, 0.31481866822182, 0.430494870845778, 
    0.369470777918874, 0.369560504987554, NA, NA, NA, NA, -0.432180139707614, 
    0.30045887607367, 0.300788490080856), .Dim = 7:6, .Dimnames = list(
        c("G1_1", "G1_2", "G1_3", "G1_4", "G2_1", 
        "G2_2", "G2_3"), c("R4", "R1", "R2", "R3", 
        "R6", "R5")))

Meisam · Answer 1 · 2023-05-23T05:55:25.753

0

As stated in the other question you posted, heatmap.2 is supposed to handle NA values, unless an entire col or row is NA, in your case since you are only clustering columns you probably have a column entirely made of NA values. You can check that with this:

sapply(scaled_df, function(x)all(is.na(x)))

if you have too many columns to manually check, wrap it in any():

any(sapply(scaled_df, function(x)all(is.na(x))))

if the outcome is TRUE you HAVE to remove those columns before you can continue with your heatmap visualisation.

UPDATE

As the OP has provided the reproducible example, it is a problem with the hclust() not handling NA values if there is no row with entirely non-NA values (as oppose to the more common case of a row/column entirely made of NAs). The best solution I could find is based on this post, which eventually suggests to replace NA values with a values slightly above maximum and then set a break in the pallet changing values above max to a color like missing value color. Although this is practical solution, it is not perfect since changing NA to values would affect clustering. Any better idea anyone?

edited May 23 '23 at 05:55

answered May 23 '23 at 02:51

Meisam

601
1
3
16

I've looked into that and both the dataset that worked and doesn't work comes up to "TRUE"! Weird? – potatojj May 23 '23 at 03:34
It is weird, I'm sorry but it's almost impossible to help you without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Meisam May 23 '23 at 03:47
I've edited my original question with reproducible examples – potatojj May 23 '23 at 05:00
I have updated my answer, take a look at the other post I mentioned. Apparently the hclust produces NA in the second dataset, as there is no column with all non-NA values. Either follow the instruction mentioned above, or see if you can add or manually mutate any of your columns in a way that a column with all numbers is avilable so hclust can handle. Hope it helps – Meisam May 23 '23 at 05:58

Heatmap.2 cluster not working for similar datasets

1 Answers1