4

I've been researching this for a while now but haven't come across any solution that fit my needs or that I can transform sufficiently to work in my case:

I have a large car sharing data set for multiple cities in which I have the charging demand per location (e.g. row = carID, 55.63405, 12.58818, charging demand). I now would like to split the area over the city (example above is Copenhagen) up into a hexagonal grid and tag every parking location with an ID (e.g. row = carID, 55.63405, 12.58818, charging demand, cell ABC) so I know which hexagonal cell it belongs to.

So my question is twofold: (1) how can I create such a honeycomb grid with a side length of 124 meters (about 40000 sqm which is equivalent to 200x200 meters but nicer in hexagonal) in this area:

my_area <- structure(list(longitude = c(12.09980, 12.09980, 12.67843, 12.67843), 
                       latitude = c(55.55886, 55.78540, 55.55886, 55.78540)), 
                  .Names = c("longitude", "latitude"), 
                  class = "data.frame", row.names = c(NA, -4L))

(2) How can I then associate all of my points on the map with a certain grid cell?

I'm really lost at this point, I tried to use tons of packages like rgdal, hexbin, sp, raster, rgeos, rasterVis, dggridR, ... but none of them got me to where I want to go. Help is much appreciated!

Parking data example:

                  id latitude longitude           timestamp charging_demand
1: WBY1Z210X0V307780 55.68387  12.60167 2016-07-30 12:35:07              22
2: WBY1Z210X0V307780 55.63405  12.58818 2016-07-30 16:35:07              27
3: WBY1Z210X0V307780 55.68401  12.49015 2016-08-02 16:00:08              44
4: WBY1Z210X0V307780 55.68694  12.49146 2016-08-03 13:40:07               1
5: WBY1Z210X0V307780 55.68564  12.48824 2016-08-03 14:00:07              66
6: WBY1Z210X0V307780 55.66065  12.60569 2016-08-04 16:19:15              74
Jonathan
  • 148
  • 1
  • 10
  • I was looking into this recently. I'll be interested to see if there's a way to do this in R, but I'm probably going to end up using the open-source QGIS GIS package and the mmqgis plugin (or maybe the Processing plugin). – Sharon Mar 20 '17 at 01:22
  • 1
    You probably want to do this in projected coordinates, not in longitude/latitude degrees, in order to get (close to) real hexagons. – Edzer Pebesma Mar 20 '17 at 10:51

1 Answers1

3

I think you can indeed use the hexbin package. Call the function like this:

h <- hexbin(data_x, data_y, nbins, range_x, range_y, IDs = TRUE)

The result has a column cID which tells you the cell in which the observation falls. You can use this to e.g. calculate the average charging demand per cell:

tapply(charging_demand, h@cID, FUN = function(z) sum(z)/length(z))

Additionally you can use hcell2xy to get coordinates you can use for plotting with ggplot. For an example you can look at this answer.

Community
  • 1
  • 1
AEF
  • 5,408
  • 1
  • 16
  • 30
  • Thanks! Just the answer I was looking for - worked beautifully! I calculated the longitude distance between the two outer-most points on the same latitude and then divided it by the height (214 meters) of the hexagon size I am aiming for to determine the number of bins. Now that I have the data frame with the hexagon centroids (hcell2xy), I would like to assign the cIDs to each parking instance (sort of reverse what I did). Eventually every row listed above should have a new column with cID. Is there an elegant way with hexbin? Else I would look for min(distance to centroid) per row. – Jonathan Mar 20 '17 at 15:16
  • 1
    You're welcome! The field h@cID already corresponds to the input data. So if your table/data.frame is called `df`, you can just do `df$cID <- h@cID` to add the new column. – AEF Mar 20 '17 at 16:05
  • I have a follow up question - would you happen to know how to extract the adjacent cells? I would like to, per cell, find out how much potential charging demand is aggregated around each cell. E.g. for cell x I would like to extract the cell ids of the six adjacent cells and add one column to my final data frame that holds the information of how much aggregated demand we had around cell x. Thanks much in advance! – Jonathan May 11 '17 at 17:16
  • 1
    I think you can use `smooth.hexbin` which is also included in the `hexbin`-package with a weight vector of `c(1, 1, 0)`. I do however not know how the `cID`s of the result correspond to the `cID`s of the original binning. Supposingly they are the same, but be sure to check. Alternatively you could compute the centers of the bins and do a 6-nearest neighbor search for each cell. – AEF May 12 '17 at 06:09
  • Thanks! I played around with `smooth.hexbin()` for a while but couldn't get it to work the way I wanted. The function only allows me to smooth over demand (I replaced _h@count_ with _charging demand_) into adjacent cells. I, however, need to sum all demand from adjacent cells and add it to my data as a second column so I have, on a cell by cell level, _charging-demand-per-cell_ and _adjacent-cell-demand_. Your second alternative sounds suitable, how would you go about finding the six nearest neighbors? I stared with distance matrices but that drifts out of proportion quickly... – Jonathan May 23 '17 at 21:47
  • 1
    How does "smoothing over demand" differ from summing the demand? Isn't the smoother used in this function just a weighted sum? A simple, yet inefficient way to find the nearest neighbours would be to compute the paiwise distances and sort them. Alternatively (mis-)use any knn classfifier function. Most of them give you also the indices of the neighbours. – AEF May 24 '17 at 07:44
  • Thanks! I would up using `nn2` from the `RANN` package. You can specify a radius and look for the X closest neighbors by lon/lat. Since I'm using hexagons I looked for the six closest (+ the center): `six_closest <- nn2(data = df.hexagons, k = 7, searchtype = "radius", radius = radius)`. Worked like a charm and with some transforming it makes for a formidable lookup data frame. The one downside is that it sometimes only correctly identifies 4 or 5 out of 6, but one cell further off. In my case this was not so bad because absolute precision here was not needed... – Jonathan May 25 '17 at 17:39