2

I have latitude and longitude points:

> d1 <- data.frame(lat, lon)
> head(d1)
       lat       lon
1 43.25724 -96.01955
2 43.25724 -95.98172
3 43.25724 -95.92336
4 43.25616 -96.40973
5 43.25616 -96.25733
6 43.25616 -96.17735

There are 413 of them. I would like to (two ways of saying the same thing):

  • stratify them into 9 groups (arranged in a grid) based on the latitude AND longitude.

  • draw gridlines like a tic-tac-toe board on a plot of lat vs lon and divide the points into bins (stratify) based on the grid cell they fall into.

If I wanted to just divide the latitude into 9 groups, I could use the cut function like this, but I'm essentially looking for a two-dimensional version of cut.

EDIT:

Using the suggestion: how can I plot this?

paste(cut(lat, 3, labels=FALSE), cut(lon, 3, labels=FALSE))
  [1] "3 3" "3 3" "3 3" "3 1" "3 2" "3 2" "3 3" "3 3" "3 2" "3 1" "3 2" "3 1" "3 2"
 [14] "3 3" "3 3" "3 1" "3 3" "3 2" "3 2" "3 2" "3 1" "3 3" "3 1" "3 1" "3 3" "3 2"
 [27] "3 2" "3 2" "3 1" "3 2" "3 1" "3 3" "3 1" "3 3" "3 1" "3 2" "3 3" "3 2" "3 2"
 [40] "3 3" "3 3" "3 2" "3 2" "3 2" "3 3" "3 1" "3 3" "3 3" "3 3" "3 2" "3 3" "3 3"
 [53] "3 2" "3 2" "3 3" "3 3" "3 1" "3 2" "3 1" "3 2" "3 2" "3 2" "3 3" "3 2" "3 3"
 [66] "3 3" "3 3" "3 3" "3 3" "3 3" "3 3" "3 1" "3 2" "3 3" "3 1" "3 1" "3 1" "3 1"
 [79] "3 2" "3 2" "3 2" "3 1" "3 3" "3 2" "3 2" "3 2" "3 3" "3 3" "3 1" "3 3" "3 1"
 [92] "3 3" "3 3" "3 1" "3 3" "3 1" "3 3" "3 1" "3 2" "3 3" "3 3" "3 2" "3 2" "3 1"
[105] "3 1" "3 3" "3 2" "3 2" "3 3" "3 3" "3 3" "3 2" "3 1" "3 1" "3 2" "3 2" "3 2"
[118] "3 1" "3 1" "3 2" "3 3" "3 2" "3 2" "3 3" "3 2" "3 1" "3 3" "3 3" "3 1" "3 3"
[131] "3 1" "3 1" "3 3" "2 2" "2 2" "2 1" "2 1" "2 2" "2 3" "2 1" "2 2" "2 2" "2 3"
[144] "2 1" "2 2" "2 3" "2 3" "2 2" "2 3" "2 3" "2 2" "2 2" "2 3" "2 2" "2 1" "2 2"
[157] "2 2" "2 3" "2 3" "2 1" "2 1" "2 2" "2 1" "2 1" "2 1" "2 3" "2 2" "2 3" "2 3"
[170] "2 3" "2 2" "2 3" "2 3" "2 2" "2 1" "2 1" "2 1" "2 2" "2 2" "2 2" "2 2" "2 2"
[183] "2 3" "2 1" "2 2" "2 2" "2 3" "2 3" "2 2" "2 2" "2 3" "2 2" "2 2" "2 2" "2 1"
[196] "2 3" "2 1" "2 2" "2 3" "2 3" "2 1" "2 3" "2 3" "2 1" "2 2" "2 1" "2 2" "2 3"
[209] "2 1" "2 3" "2 2" "2 2" "2 2" "2 3" "2 2" "2 1" "2 2" "2 2" "2 3" "2 3" "2 3"
[222] "2 2" "2 3" "2 2" "2 1" "2 1" "2 2" "2 2" "2 3" "2 2" "2 3" "2 2" "2 2" "2 1"
[235] "2 2" "2 2" "2 3" "2 2" "2 3" "2 3" "2 3" "2 3" "2 1" "2 1" "2 2" "2 2" "2 3"
[248] "2 1" "2 2" "2 3" "2 2" "2 3" "2 3" "2 1" "2 1" "2 3" "2 3" "2 1" "2 3" "2 1"
[261] "2 1" "2 1" "2 3" "2 1" "2 2" "2 2" "2 2" "2 3" "2 3" "2 1" "2 1" "2 2" "2 3"
[274] "2 3" "2 2" "2 2" "2 1" "1 2" "1 2" "1 3" "1 3" "1 1" "1 1" "1 2" "1 2" "1 2"
[287] "1 2" "1 1" "1 3" "1 3" "1 2" "1 1" "1 1" "1 1" "1 2" "1 1" "1 1" "1 3" "1 2"
[300] "1 2" "1 2" "1 3" "1 1" "1 3" "1 1" "1 3" "1 2" "1 1" "1 2" "1 2" "1 2" "1 1"
[313] "1 3" "1 1" "1 1" "1 2" "1 3" "1 1" "1 2" "1 1" "1 2" "1 1" "1 3" "1 2" "1 2"
[326] "1 1" "1 2" "1 3" "1 3" "1 1" "1 2" "1 3" "1 3" "1 1" "1 3" "1 3" "1 1" "1 2"
[339] "1 2" "1 2" "1 3" "1 1" "1 2" "1 3" "1 2" "1 3" "1 3" "1 1" "1 2" "1 2" "1 1"
[352] "1 1" "1 2" "1 2" "1 3" "1 3" "1 1" "1 2" "1 2" "1 3" "1 1" "1 2" "1 2" "1 3"
[365] "1 1" "1 2" "1 1" "1 3" "1 3" "1 1" "1 1" "1 2" "1 2" "1 3" "1 1" "1 3" "1 1"
[378] "1 3" "1 3" "1 1" "1 1" "1 2" "1 3" "1 2" "1 1" "1 2" "1 3" "1 3" "1 2" "1 2"
[391] "1 3" "1 1" "1 2" "1 2" "1 3" "1 2" "1 2" "1 3" "1 1" "1 3" "1 1" "1 2" "1 2"
[404] "1 2" "1 1" "1 3" "1 1" "1 2" "1 1" "1 1" "1 1" "1 3" "1 1"
> 

The problem is that the latitude and longitude must be in the same grid section. I might be wrong, but it doesn't look like that's happening here.

EDIT 2: Something's going wrong...getting NAs.

> df2 <- data.frame(lat, lon)
> df2 <- within(df2, {
+   grp.lat = cut(lat, (0:3)/3, labels = FALSE)
+   grp.lon = cut(lon, (0:3)/3, labels = FALSE)
+ })
> head(df2)
       lat       lon grp.lon grp.lat
1 43.25724 -96.01955      NA      NA
2 43.25724 -95.98172      NA      NA
3 43.25724 -95.92336      NA      NA
4 43.25616 -96.40973      NA      NA
5 43.25616 -96.25733      NA      NA
6 43.25616 -96.17735      NA      NA

FINAL SOLUTION:

#Divide the dataset into equally-sized chunks, as evenly as possible, for 9 chunks (coarse)

df2 <- data.frame(lat, lon)
df2 <- within(df2, {
  grp.lat = cut(lat, 3, labels = FALSE)
  grp.lon = cut(lon, 3, labels = FALSE)
})
head(df2)

#Want the minimum lon value for which grp.lon = 1 and the maximum lon value for which grp.lon=1

start_grp1_lon <- min(df2$lon[df2$grp.lon==1])
start_grp2_lon <- min(df2$lon[df2$grp.lon==2])
start_grp3_lon <- min(df2$lon[df2$grp.lon==3])

start_grp1_lat <- min(df2$lat[df2$grp.lat==1])
start_grp2_lat <- min(df2$lat[df2$grp.lat==2])
start_grp3_lat <- min(df2$lat[df2$grp.lat==3])

plot(lat ~ lon, data = df2, pch = (15:23)[grp.lon*grp.lat], col=[grp.lon]) #pch = (15:17)[grp.lon], col = grp.lat)
abline(v = c(start_grp1_lon, start_grp2_lon, start_grp3_lon))
abline(h = c(start_grp1_lat, start_grp2_lat, start_grp3_lat))
Community
  • 1
  • 1
StatsSorceress
  • 3,019
  • 7
  • 41
  • 82
  • something like this? apply `cut` to both columns simultaneously: `df$group <- paste(cut(df$lat, 3, labels=FALSE), cut(df$lon, 3, labels=FALSE))` – chinsoon12 Apr 19 '17 at 00:51
  • That doesn't seem to keep the lat and lon together. – StatsSorceress Apr 19 '17 at 00:57
  • what do you mean by "keep the lat and lon together"? – chinsoon12 Apr 19 '17 at 00:59
  • Hi @chinsoon12, please see my edit – StatsSorceress Apr 19 '17 at 01:01
  • imagine a gridded x-y axis with major axes at x = (0,1,2) and y = (0, 1, 2), and you have 2 points (0.5, 0.5) and (0.5, 1.5). should these fall into the same grid? – chinsoon12 Apr 19 '17 at 01:05
  • 1
    `?ggplot2::geom_bin2d` – alistaire Apr 19 '17 at 01:06
  • 1
    In EDIT2: use number of intervals instead of break points, such as `grp.lat = cut(lat, 3, labels = FALSE)`. – nya Apr 19 '17 at 05:19
  • No. For the second argument to `cut`, you need to use numbers *relevant to your data*. Right now you are trying to find all `lat`s and `lon`s with 0, 1/3, 2/3, and 3/3, obviously not going to find anything. I suggest something based on `range(df2$lat)` and `range(df2$lon)`. – r2evans Apr 19 '17 at 23:12
  • (See `?cut` for details on the arguments. Note that one of the two ends will be "open-ended", so you may need to *expand* one side to ensure you get all data after using `range`, `min`, or `max` to define your bins.) – r2evans Apr 19 '17 at 23:17

1 Answers1

3

One way is to use cut on each axis.

set.seed(2)
n <- 50
df <- data.frame(x = runif(n), y = runif(n))
head(df)
#           x           y
# 1 0.1848823 0.007109038
# 2 0.7023740 0.014693911
# 3 0.5733263 0.683403423
# 4 0.1680519 0.929720222
# 5 0.9438393 0.275401199
# 6 0.9434750 0.811859695

Now assign the bins, arbitrarily generating a 3x3 grid:

df <- within(df, {
  grp.x = cut(x, (0:3)/3, labels = FALSE)
  grp.y = cut(y, (0:3)/3, labels = FALSE)
})
head(df)
#           x           y grp.y grp.x
# 1 0.1848823 0.007109038     1     1
# 2 0.7023740 0.014693911     1     3
# 3 0.5733263 0.683403423     3     2
# 4 0.1680519 0.929720222     3     1
# 5 0.9438393 0.275401199     1     3
# 6 0.9434750 0.811859695     3     3

Now these can be used in grouping, coloring, etc. Here's a graph just for demonstration, but indicating that by color (Y-axis) and shape (X-axis), the points can be processed as a group.

enter image description here

plot(y ~ x, data = df, pch = (15:17)[grp.x], col = grp.y)
abline(v = (1:2)/3)
abline(h = (1:2)/3)
r2evans
  • 141,215
  • 6
  • 77
  • 149