I have a dataframe with columns for different attributes and a column for the class label. I am trying to create a Heatmap/matrix plot of all the attributes with the data points categorized by their class label.
If I turn my dataframe into a numeric matrix, I can use the heatmap function to create a heatmap:
q3 <- read.arff("diabetes.arff")
q3_m <- as.matrix(q3[,1:8])
heatmap(q3_m, Colv=NA, Rowv=NA)
However, I can't figure out how to order these by the class variable, as I had to remove it from the matrix because it isn't numeric.
If I transform the data into the long format, I can also make the following heatmap using ggplot:
q3_long <- pivot_longer(q3, preg:age, names_to = "Attribute",
values_to = "Value")
ggplot(data = q3_long, mapping = aes(x = Attribute, y=class, fill = Value)) +
geom_raster() +
xlab(label = "Attribute")
However, this averages the values of every case in a given class rather than showing every case as a separate row with its own fill.
How can I combine these approaches to get a heatmap that clusters the cases by class?
(Apologies in advance - I attempted to include images here ,but I just joined stackoverflow and therefore don't have the 10 reputation points needed to include images).
Thanks for your help.
Edit: here is a sample of the data. It is also publicly available - the diabetes.arff dataset is automatically downloaded with Weka installation (https://waikato.github.io/weka-wiki/downloading_weka/).
structure(list(preg = c(6, 1, 8, 1, 0, 5, 3, 10, 2, 8, 4, 10,
10, 1, 5, 7, 0, 7, 1, 1), plas = c(148, 85, 183, 89, 137, 116,
78, 115, 197, 125, 110, 168, 139, 189, 166, 100, 118, 107, 103,
115), pres = c(72, 66, 64, 66, 40, 74, 50, 0, 70, 96, 92, 74,
80, 60, 72, 0, 84, 74, 30, 70), skin = c(35, 29, 0, 23, 35, 0,
32, 0, 45, 0, 0, 0, 0, 23, 19, 0, 47, 0, 38, 30), insu = c(0,
0, 0, 94, 168, 0, 88, 0, 543, 0, 0, 0, 0, 846, 175, 0, 230, 0,
83, 96), mass = c(33.6, 26.6, 23.3, 28.1, 43.1, 25.6, 31, 35.3,
30.5, 0, 37.6, 38, 27.1, 30.1, 25.8, 30, 45.8, 29.6, 43.3, 34.6
), pedi = c(0.627, 0.351, 0.672, 0.167, 2.288, 0.201, 0.248,
0.134, 0.158, 0.232, 0.191, 0.537, 1.441, 0.398, 0.587, 0.484,
0.551, 0.254, 0.183, 0.529), age = c(50, 31, 32, 21, 33, 30,
26, 29, 53, 54, 30, 34, 57, 59, 51, 32, 31, 31, 33, 32), class = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 1L, 2L), .Label = c("tested_negative", "tested_positive"), class = "factor")), row.names = c(NA,
20L), class = "data.frame")