16

I am trying to create a ggplot that includes coordinates, density, and a convex hull polygon.

The data is a set of twenty latitudinal and longitudinal points.

This is my code:

# Data

economy <- read.csv("data-economy.csv", header=TRUE)

# Convex hulls.

hulls <- ddply(economy, .(Latitude, Longitude), function(economy) 
            economy[chull(economy$Latitude, economy$Longitude), ])

fig <- ggplot(economy, aes(Latitude, Longitude, colour="black", fill="black")) + 
              geom_point() + 
              geom_density2d(alpha=.5) + 
              labs(x = "Latitude", y = "Longitude") + 
              geom_polygon(data=hulls, alpha=.2)

fig

The resulting plot looks like this:

economypolygon1

I've tried a few things, and I can't get the convex hull to include only the points with max latitude and longitude. I can get the shape that I want outside of ggplot by using this code:

X <- economy
chull(X)
plot(X, cex = 0.5)
hpts <- chull(X)
hpts <- c(hpts, hpts[1])
lines(X[hpts, ])

The result that it gives me is this:

economypolygon2

How can I get the same shape as in R base in ggplot?

Also, why when I change the color in my ggplot code, does it not change the plot?

Sakhri Houssem
  • 975
  • 2
  • 16
  • 32
caira
  • 161
  • 1
  • 1
  • 4
  • 1
    See e.g. https://stats.stackexchange.com/questions/22805/how-to-draw-neat-polygons-around-scatterplot-regions-in-ggplot2 – Mikko Marttila Feb 08 '18 at 21:17
  • @Mikko Marttila Thanks. I used the code from that question to come up with what I have so far. I'm not seeing why it's not using only the outside points from my data - any ideas? – caira Feb 12 '18 at 17:02
  • In your `ddply` call you are splitting your data by unique values of `Latitude` and `Longitude` (i.e. distinct points) and finding the convex hull for each point, which is just the point itself. – Mikko Marttila Feb 12 '18 at 17:17
  • Try just doing `hulls <- economy[chull(economy$Latitude, economy$Longitude), ]` – Mikko Marttila Feb 12 '18 at 17:18
  • The answer in the linked Cross Validated post uses `ddply` in order to find a convex hull for multiple groups at the same time; while in your problem you just have one set of points to find a solution for, so you don't need `ddply` here. – Mikko Marttila Feb 12 '18 at 17:20

1 Answers1

28

Your problem is with the ddply: currently your code splits the data by distinct values of Latitude and Longitude (i.e. each point you're plotting), finds the convex hull in each split (which is just the point itself) and binds the results together, effectively just giving you a row for each point in your data. That's why the polygon you draw touches every point.

Here's a solution that should work:

library(tidyverse)

# Find the convex hull of the points being plotted
hull <- mtcars %>%
  slice(chull(mpg, wt))

# Define the scatterplot
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point(shape = 21)

# Overlay the convex hull
p + geom_polygon(data = hull, alpha = 0.5)

Now if you wanted to add a grouping to your plot, all you need to do is calculate the chull for each level of your grouping variable:

# Calculate the hulls for each group
hull_cyl <- mtcars %>%
  group_by(cyl) %>%
  slice(chull(mpg, wt))

# Update the plot with a fill group, and overlay the new hulls
p + aes(fill = factor(cyl)) + geom_polygon(data = hull_cyl, alpha = 0.5)

Created on 2018-02-12 by the reprex package (v0.2.0).

By the way, there's also a nice example in one of the ggplot2 vignettes, where they go through a step-by-step guide to creating custom stats and geoms, using the convex hull as an example: https://cran.r-project.org/web/packages/ggplot2/vignettes/extending-ggplot2.html.

Mikko Marttila
  • 10,972
  • 18
  • 31