I have a dataframe with three variables of interest: LGA(Location), Offence Category and Total (numeric)
What I am hoping to do, is compare the distance/similarity between each LGA, based on the Total value, in order to create a heat map or similar structure. Is this possible? And if so, what would the process be?
Here is a snippet of the data frame:
Asked
Active
Viewed 563 times
-1

camille
- 16,432
- 18
- 38
- 60

Luke Hansen
- 17
- 3
-
This is currently too broad, and not [reproducible](https://stackoverflow.com/q/5963269/5325862). Take a look again at [ask] and the [mcve] guidance to see how you can narrow down the question – camille Jun 08 '21 at 02:45
-
Are you looking to build a [chloropleth](https://blog.stata.com/wp-content/uploads/2020/04/covid19_map2.png)? And if so is there lat/lon data? What exactly is LGA in terms of counties or cities? – Kent Orr Jun 08 '21 at 02:46
1 Answers
0
I don't really understand your question, but here is an example of a heatmap and a clustered heatmap for 'similar' data:
# Load libraries
library(tidyverse)
library(readxl)
library(httr)
# Find some data
url1 <- "https://www.bocsar.nsw.gov.au/Documents/lga/NewSouthWales.xlsx"
# Get the data and remove missing data points (NA's)
GET(url1, write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(path = tf, 2L, skip = 5) %>%
na.omit()
df2 <- df %>%
# format the data to "long format" for plotting
pivot_longer(cols = -c(`Premises type`)) %>%
# Change "Premises type" and "name" to factors
mutate(`Premises type` = factor(
`Premises type`, levels = unique(`Premises type`))
) %>%
mutate(name = factor(
name, levels = unique(name))
) %>%
# Remove the "Total" counts
filter(`Premises type` != "Total")
# Define colours for text (white for dark fill, black for light fill)
hcl <- farver::decode_colour(viridisLite::inferno(length(df2$value)), "rgb", "hcl")
label_col <- ifelse(hcl[, "l"] > 50, "black", "white")
# Plot the data (log scale for fill)
ggplot(df2, aes(y = fct_rev(`Premises type`),
x = name, fill = log(value))) +
geom_tile() +
geom_text(aes(label = value, color = factor(value)),
show.legend = FALSE, size = 2.5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1.05),
axis.title = element_blank()) +
scale_color_manual(values = label_col) +
scale_fill_viridis_c(option = "inferno", na.value = "black")
And a clustered heatmap (similar Premises Type / Crime types cluster together):
# Load the raw data and format for pheatmap (expects a matrix)
dm <- read_excel(path = tf, 2L, skip = 5) %>%
na.omit() %>%
column_to_rownames(var = "Premises type")
# Plot the data
pheatmap::pheatmap(as.matrix(dm), scale = "row")
Edit
I haven't used it before, so I don't know if the output is correct, but based on this SO post you can use cluster::daisy()
to get the gower dissimilarity for "Premises Type" then plot using pheatmap, e.g.
library(cluster)
pheatmap::pheatmap(as.matrix(daisy(dm)))
Edit 2
You only need two variables for this heatmap (i.e. "Local government Area" (Character) and "Total" (Numeric) should be fine):
# Load libraries
library(tidyverse)
library(readxl)
library(httr)
library(cluster)
library(pheatmap)
# Find some data
url1 <- "https://www.bocsar.nsw.gov.au/Documents/lga/NewSouthWales.xlsx"
# Get the data and remove missing data points (NA's)
GET(url1, write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(path = tf, 2L, skip = 5) %>%
na.omit()
# Select two variables, then set the Premises type as the rownames
df3 <- df %>%
select(`Premises type`, Robbery) %>%
column_to_rownames(var = "Premises type")
# (in your case, use "column_to_rownames(`Local government Area`)"
# Then plot the heatmap
pheatmap(daisy(as.matrix(df3)),
labels_row = rownames(df3),
labels_col = rownames(df3))

jared_mamrot
- 22,354
- 4
- 21
- 46
-
Thanks Jared, much appreciated. I was planning to use gower distance for the similarity – Luke Hansen Jun 08 '21 at 03:59
-
I've updated my answer, but I have no idea if the answer is "correct" or not – jared_mamrot Jun 08 '21 at 04:23
-
That is exactly what I am after. I am just not sure how to get there, I get an error saying: error in evaluating the argument 'x' in selecting a method for function 'as.matrix': invalid type character for column numbers 3, 4invalid type character for column numbers 3, 4. Basically, I have a data frame now with two variables: Local government Area (Character) and Total (Numeric). How would I depict these variables in a heatmap like the one you have created above with pheatmap? – Luke Hansen Jun 08 '21 at 10:30
-
You only need two variables for this style heatmap - see my 2nd edit above – jared_mamrot Jun 08 '21 at 11:15