1

Good morning everyone,

I need to plot the centroid of a contour plot. Consider this code as an example:

library(ggplot2)
set.seed(1)
df <- data.frame(x = rnorm(50), y = rnorm(50))
ggplot() +
geom_density2d(data = df, aes(x, y), color = "#ff0000", bins = 5)+
geom_point() +
theme(axis.title= element_blank(), panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(size = 0.5, linetype = "solid", colour = "black"), panel.background = element_rect(fill = "white", colour = "white", size = 10, linetype = "solid"), text = element_text(family = "sans"))

I obtain an image like this:

enter image description here

How can i find and draw the centroid of this figure?

Thanks in advance!!

tjebo
  • 21,977
  • 7
  • 58
  • 94
Tia Cava
  • 83
  • 5
  • 1
    I'm not really an expert, but does not the centroid differ between the different contours? do you want the centroid of the first, second, last contours? Also what about computing the mean value for x and y? – Maël Apr 08 '22 at 09:21
  • How are you defining the "centroid"? Each polygon has a centroid. Each polygon, as a contour, also defines a level - do you want that to influence the centroid - are you looking for the X-Y coordinate that this 3d contoured shape would balance on, then do you want to consider the contours as level steps or would it be better to consider a smooth surface? Do you really care about the exact contours from ggplot or the exact way it makes contours or should you start from your data? – Spacedman Apr 09 '22 at 09:42

2 Answers2

2

There are really two questions here:

  1. How do I extract a polygon from a ggplot?
  2. How do I work out the centroid of a polygon?

If we start with the plot from your code:

set.seed(1)

df <- data.frame(x = rnorm(50), y = rnorm(50))

p <- ggplot() +
  geom_density2d(data = df, aes(x, y), color = "#ff0000", bins = 5)+
  geom_point() +
  theme(axis.title= element_blank(), 
        panel.border = element_blank(), 
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        axis.line = element_line(size = 0.5, 
                                 linetype = "solid", colour = "black"), 
        panel.background = element_rect(fill = "white", colour = "white", 
                                        size = 10, linetype = "solid"), 
        text = element_text(family = "sans"))

Then we can get a data frame of the x, y co-ordinates of all 4 polygons making up the contours in your ggplot by doing:

contours <- layer_data(p)

As noted in the comments, since there are multiple contour lines, there are multiple centroids. I assume you are looking for the centroid of the central contour. We can get that by doing:

contours <- contours[contours$piece == 5,]
pracma::poly_center(contours$x, contours$y)
#> [1]  0.1131070 -0.1200828

So to plot the centroid, we need only do:

p + geom_point(aes(x = 0.1131070, y = -0.1200828))

enter image description here

Note that this is not the same as the mean of x and mean of y:

p + 
  geom_point(aes(x = 0.1131070, y = -0.1200828)) +
  geom_point(aes(x = mean(df$x), y = mean(df$y)), colour = 'green')

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • there are a few very related threads about centroid calculation, that can be found via this thread (see linked threads). https://stackoverflow.com/questions/9441436/ggplot-centered-names-on-a-map?noredirect=1&lq=1 In particular, teunbrands ggh4x package offers a stat that computes the centroid (he also answered to the linked thread). – tjebo Apr 10 '22 at 18:05
  • 1
    Thanks @tjebo - I think where this question differs is that it specifically asks for the centroid of a contour line, so the answer was more about extracting the contour lines from a plot and getting the centroid of that, but handy to know about Teun's stat! – Allan Cameron Apr 10 '22 at 18:40
1

It is an interesting yet hard exercise. geom_density2d is just plotting lines based on point densities, according to certain parameters (bins = 5). You can approximate the "center of mass" of your points using spatial analysis:

library(ggplot2)
set.seed(1)
df <- data.frame(x = rnorm(50), y = rnorm(50))

# Spatial analysis
library(sf)
library(raster)

# Your points as spatial
points <- st_as_sf(df, coords=c("x", "y"))

# Create a raster (grid 100x100)
rast <- raster(points, ncols=10, nrows=10)
# Count the number of points on each pixel
rasterized <- rasterize(points, rast, fun="count")

plot(rasterized)

enter image description here

So with that we have detected which pixel (square) has more points, hence this square has the higher density of points. Now we can extract the coordinates of that pixel and plot that:


df_points <- as.data.frame(rasterized, xy=TRUE, na.rm=TRUE)
cent <- df_points[df_points$layer == max(df_points$layer), ]
cent$label <- "centroid?"


ggplot() +
  geom_density2d(data = df, aes(x, y), color = "#ff0000", bins = 5) +
  geom_point(data=cent, aes(x, y, color=label)) +
  scale_color_manual(values="green") + 
  # For contrast only
  geom_sf(data=points, alpha=0.15) +
  theme(
    axis.title = element_blank(),
    panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(size = 0.5, linetype = "solid", colour = "black"), panel.background = element_rect(fill = "white", colour = "white", size = 10, linetype = "solid"), text = element_text(family = "sans")
  )

enter image description here

The point on cent is not the centroid of the contour plot, but represent where the higher concentration of points can be located.

You can also compute the mean of your coordinates...

ggplot() +
  geom_density2d(data = df, aes(x, y), color = "#ff0000", bins = 5) +
  geom_point(aes(mean(df$x), mean(df$y))) +
  # For contrast only
  geom_sf(data=points, alpha=0.15) +
  theme(
    axis.title = element_blank(),
    panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(size = 0.5, linetype = "solid", colour = "black"), panel.background = element_rect(fill = "white", colour = "white", size = 10, linetype = "solid"), text = element_text(family = "sans")
  )


enter image description here Hope that helps!

dieghernan
  • 2,690
  • 8
  • 16
  • When I use this cent <- df_points[df_points$layer == max(df_points$layer), ] it returns: Error: object "df_points" not found @dieghernan – Tia Cava Apr 08 '22 at 10:29
  • Yes, sorry, use `df_points <- as.data.frame(rasterized, xy=TRUE, na.rm=TRUE)`. I am updating the answer – dieghernan Apr 08 '22 at 10:33
  • Is the grid 100x100 chosen arbitrarily? @dieghernan – Tia Cava Apr 08 '22 at 13:59
  • Yes, in fact there is a mistake, is 10x10. Beware also the size since either too few (1x1) or too much cells (1000x1000) won’t produce meaningful results. If you feel this approach would fit your needs my advice is to play with the cell sizes until you get your desired results. – dieghernan Apr 08 '22 at 14:16