1

This is my first time writing code in R from scratch and I'm struggling with how to approach it. I'm looking at turtle nests and their proximity to light sources (i.e. houses, light poles, etc.) to determine how often a light source is within a given radius of a nest.

These are both very large data sets (hundreds of thousands of rows) so the code will likely need to run a loop for each nest position. GPS coordinates for both data sets are in decimal degrees.

The nest data is essentially latitude, longitude, date observed, and species (if known)

The light source data is latitude, longitude, type, and several other light-related parameters I'd like to keep in the data set.

Any suggestions on how to loop through the nest coordinates to determine light sources within radius, r, would be greatly appreciated! For each light source within r for a nest, I'd like for the end result to spit out the entire row of light source data (type, location, additional light-related parameters, etc.) if that is possible rather than just say how many values were T vs. F for being inside r. Thanks!

> Nest <- read.csv("Nest.csv", header=T)
> Lights <- read.csv("Lights.csv", header=T)
> #Nest
> dput(droplevels(Nest[1:10, ]))
structure(list(LAT = c(34.146535, 34.194585, 34.216854, 34.269901, 
34.358718, 34.37268, 34.380848, 34.394183, 34.410384, 34.415077
), LONG = c(-77.839787, -77.804013, -77.787032, -77.742722, -77.63655, 
-77.619872, -77.609373, -77.591654, -77.568456, -77.561256), 
    DATE = structure(c(2L, 3L, 4L, 5L, 6L, 8L, 9L, 10L, 1L, 7L
    ), .Label = c("2016-05-19T03:12", "2016-05-21T07:23", "2016-05-23T08:14", 
    "2016-05-24T04:21", "2016-05-25T11:15", "2016-05-27T05:12", 
    "2016-05-27T09:45", "2016-05-28T09:42", "2016-05-28T10:18", 
    "2016-05-29T02:26"), class = "factor"), SPECIES = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Cc", class = "factor")), row.names = c(NA, 
10L), class = "data.frame")
> #Lights
> dput(droplevels(Lights[1:10, ]))
structure(list(LAT = c(34.410925, 34.410803, 34.410686, 34.410476, 
34.410361, 34.410237, 34.410151, 34.410016, 34.409821, 34.409671
), LONG = c(-77.568183, -77.568296, -77.568478, -77.568757, -77.568915, 
-77.569135, -77.569355, -77.569527, -77.569707, -77.569905), 
    DATE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), .Label = "5/19/2016", class = "factor"), TYPE = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "R", class = "factor"), 
    WATTS = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 
10L), class = "data.frame")
  • This question is a little vague for this forum. We tend to discourage things like "any suggestions...", that prompt discussions - there are other forums for that. We like to focus on concrete questions that can be answered with code. To that end, could you append to your question a little bit of sample data? Just a few rows from each data set, provided in valid R syntax, would make your question much more answerable here. (`dput()` can be very useful, e.g., `dput(droplevels(your_data[1:10, ]))` for the first 10 rows). – Gregor Thomas Jun 19 '20 at 03:57
  • Your description is very nice, but a little more context could be helpful - e.g., if your data is latitude/longitude, there are nice packages to calculate great circle distances between points. But, for large data like yours, those comparisons can be expensive. I assume your points are pretty close to each other, so a Euclidean distance approximation is probably good enough, and much more efficient. You want a non-equi join based on distance, which is probably best done with the `data.table` package. But it's hard to provide more details without an example. – Gregor Thomas Jun 19 '20 at 04:02
  • @GregorThomas thanks for you replies! I followed your suggestion and added some code to show a sample of the data sets (I had already broken it down into a subset while I make the code so that things go quicker than if I ran it through the entire data set). The light source data is very close to each other but the nest locations are a lot more spread out. If anything else would be helpful just let me know. – laterperspective Jun 19 '20 at 14:08

1 Answers1

0

As you stated that your data sets were large, the proposed solution tries to avoid a full cartesian product between all Nest vs all Lamps.
For this, we use the non equi join possibilities of data.table which only allows simple operators like > or <.
This allows to make a first filter of the Lamps in a box around a Nest.
This box should be large enough to contain the circle of the max distance to Nest.
In a second step, we calculate the distance on the filtered data (much less calculation than a cartesian product of all data) :

library(data.table)
library(geosphere)

#To data.table
setDT(Nest)
setDT(Lights)

# Define a box around each nest
dlon<- 0.001
dlat <- 0.001

Nest[,c("LATNest","LONGNest","latmin","latmax","longmin","longmax"):=.(LAT,LONG,LAT-dlat, LAT+dlat,LONG-dlon,LONG+dlon)]
Nest[,c("LAT","LONG") :=.(NULL,NULL)]

# Search lights in box
LightNearNest <- Nest[Lights, .(LATNest,LONGNest, LATLight = LAT, LONGLight = LONG), on = .(latmin<LAT , latmax>LAT,longmin<LONG,longmax>LONG),nomatch=0,allow.cartesian=T]     


# Calculate distance 
LightNearNest[,dist:= geosphere::distHaversine(cbind(LONGNest,LATNest),cbind(LONGLight,LATNest))]
LightNearNest

    LATNest  LONGNest LATLight LONGLight      dist
1: 34.41038 -77.56846 34.41092 -77.56818 25.072269
2: 34.41038 -77.56846 34.41080 -77.56830 14.694370
3: 34.41038 -77.56846 34.41069 -77.56848  2.020476
4: 34.41038 -77.56846 34.41048 -77.56876 27.643784
5: 34.41038 -77.56846 34.41036 -77.56892 42.154475
6: 34.41038 -77.56846 34.41024 -77.56914 62.359234
7: 34.41038 -77.56846 34.41015 -77.56936 82.563993
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Thanks for the reply, this is really helpful! Is there a way to define a circle around the nest location instead of a box? – laterperspective Jun 19 '20 at 21:32
  • As far as I know a circle will be difficult because the join expression only accepts simple operators. See my edit for further explainations – Waldi Jun 19 '20 at 21:35
  • Edit posted : tell me if you were successful on your large data set – Waldi Jun 19 '20 at 21:57
  • Great, thanks for the feedback! So using the box around each nest first is just a way to filter the data set down to something more manageable for a circle/cartesian product around each nest in a subsequent code segment? Also, when you set the box to 0.001 for Lat and Long is that unit degrees? – laterperspective Jun 20 '20 at 02:00
  • Exactly, instead of calculating all the distances between all nests and all lamps, we calculate the distances only for lamps in a box around a nest : far less calculations. The size of the box is in degrees, I took 0.001 because geosphere::distHaversine(c(0,0),c(0.001,0.001))~160m, which makes max 80m from the nest and seemed sensible to me. You can of course increase the size of the box depending on your needs. – Waldi Jun 20 '20 at 05:19
  • Ok that makes sense! I saw this post last week about calculating distances and it confused me: [Calculate distance between two latitude longitude points](https://stackoverflow.com/questions/27928/calculate-distance-between-two-latitude-longitude-points-haversine-formula) Some people said not to use Haversine but was that because of the WG84S coordinate system the poster was using? A slight error in distances wouldn't be something I'd have to worry about if I use decimal degrees and haversine (even if I used it on a future project with much larger distances (several km away) between points? – laterperspective Jun 21 '20 at 03:32
  • In the comments on Haversine calculation I read that it **cannot be garanteed correct to more than better 0.5%**, which means that the maximum error on 100m is 50cm : I imagine you are not looking for a better precision for the almp-nest distance. – Waldi Jun 21 '20 at 04:47
  • Thanks for clarifying, and yes that wouldn't make any difference on what I'm looking at for this project. I have a few other projects to finish before starting this code but I will keep you posted on how it goes. Thanks again for all your help – laterperspective Jun 23 '20 at 02:47
  • OK, I'll be happy to know if this fully answered your question. – Waldi Jun 23 '20 at 04:45
  • did it work? If so, don't forget to [accept the answer](https://stackoverflow.com/help/someone-answers), so that others know it worked ;-) – Waldi Aug 10 '20 at 20:49