5

I have a data set with some locations:

ex <- data.frame(lat = c(55, 60, 40), long = c(6, 6, 10))

and than I have climate data

clim <- structure(list(lat = c(55.047, 55.097, 55.146, 55.004, 55.054, 
55.103, 55.153, 55.202, 55.252, 55.301), long = c(6.029, 6.0171, 
6.0051, 6.1269, 6.1151, 6.1032, 6.0913, 6.0794, 6.0675, 6.0555
), alt = c(0.033335, 0.033335, 0.033335, 0.033335, 0.033335, 
0.033335, 0.033335, 0.033335, 0.033335, 0.033335), x = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), y = c(1914, 1907.3, 1901.8, 1921.1, 
1914.1, 1908.3, 1902.4, 1896, 1889.8, 1884)), row.names = c(NA, 
10L), class = "data.frame", .Names = c("lat", "long", "alt", 
"x", "y"))

      lat   long      alt x      y
1  55.047 6.0290 0.033335 0 1914.0
2  55.097 6.0171 0.033335 0 1907.3
3  55.146 6.0051 0.033335 0 1901.8
4  55.004 6.1269 0.033335 0 1921.1
5  55.054 6.1151 0.033335 0 1914.1
6  55.103 6.1032 0.033335 0 1908.3
7  55.153 6.0913 0.033335 0 1902.4
8  55.202 6.0794 0.033335 0 1896.0
9  55.252 6.0675 0.033335 0 1889.8
10 55.301 6.0555 0.033335 0 1884.0

What I want to do is to "merge" both datasets to have climate data in the ex file. The values of lat and long in ex are different than values of lat and long in clim so I they can not be merged directly (it is the same for long). I need to find the best point (closest point in clim for each of row in the ex considering both lat and long)

The expected output for the example is:

  lat long      alt x      y
1  55    6 0.033335 0 1914.0
2  60    6 0.033335 0 1884.0
3  40   10 0.033335 0 1921.1
Mateusz1981
  • 1,817
  • 17
  • 33
  • Calculated wrongly, uppdated – Mateusz1981 May 24 '18 at 07:09
  • Possible duplicate of [Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)](https://stackoverflow.com/questions/31668163/geographic-geospatial-distance-between-2-lists-of-lat-lon-points-coordinates) – Aramis7d May 24 '18 at 07:14
  • I manage to make working the answer by @andrew_reece and I mark it as an answer. When I will make to work other solution I will revise my choice, Thank you very much for all comments – Mateusz1981 May 24 '18 at 09:31

2 Answers2

3

The function dist can be used to calculate Euclidean (or other) distances between all points in a matrix or data frame, so a way of finding the points in clim that are closest to those in ex is by

# Distance between all points in ex and clim combined,
# with distances between points in same matrix filtered out
n <- nrow(ex)
tmp <- as.matrix(dist(rbind(ex, clim[, 1:2])))[-(1:n), 1:n]

# Indices in clim corresponding to the closest points to those in ex
idx <- apply(tmp, 2, which.min)

# Points from ex with additional info from closest points in clim
cbind(ex, clim[idx, -(1:2)])
#>    lat long      alt x      y
#> 1   55    6 0.033335 0 1914.0
#> 10  60    6 0.033335 0 1884.0
#> 4   40   10 0.033335 0 1921.1
janusvm
  • 355
  • 2
  • 10
1

You can find the row index in clim with the lowest absolute difference of lat and long from ex, and then add in the clim columns to ex based on that index.

import(tidyverse)

ex %>%
  group_by(lat, long) %>%
  summarise(closest_clim = which.min(abs(lat - clim$lat) + 
                                       abs(long - clim$long))) %>%
  mutate(alt = clim$alt[closest_clim],
         x = clim$x[closest_clim],
         y = clim$y[closest_clim])

# A tibble: 3 x 6
# Groups:   lat [3]
    lat  long closest_clim    alt     x     y
  <dbl> <dbl>        <int>  <dbl> <dbl> <dbl>
1   40.   10.            4 0.0333    0. 1921.
2   55.    6.            1 0.0333    0. 1914.
3   60.    6.           10 0.0333    0. 1884.
andrew_reece
  • 20,390
  • 3
  • 33
  • 58
  • Shall I bother about the `warning: in lat-clim$lat: longer object length is not a multiple of shorter object length` when I scale up the example to the whole data set I have??? – Mateusz1981 May 24 '18 at 08:17
  • @Mateusz1981 those warnings are likely due to duplicate points in `ex`, resulting in groupings of `lat` and `long` that have a length > 1. Since they are all the same points, you can get rid of the warnings by putting `first(lat)` and `first(long)` in the `summarise` expression. – janusvm May 24 '18 at 09:15