-1

I want to calculate distance (in meters) between lat/long positions of my dataframe, but with this limitations: 1.- Only lat/long of the same CLIENTID, but all of the same between them. 2.- Generating a new dataframe (and export to CSV o XLS) for each CLIENTID

My data are stored as:

| CLIENT ID | HOUSE ID | LAT | LONG |
| 111111111 | xxx111   | xx.xx| xx.xx|
| 111111111 | xxx112   | xx.xx| xx.xx|
| 111111111 | xxx145   | xx.xx| xx.xx|
| 222222222 | xxx345   | xx.xx| xx.xx|
| 222222222 | xxx 666  | xx.xx| xx.xx|

What I need:

A CSV or XLS file named with CLIENTID (CLIENTID.csv) with this content:

| CLIENT ID | HOUSE ID1 | HOUSE ID 2| DISTANCE |
| 111111111 | xxx111    | xxx112    | 950      | 
| 111111111 | xxx111    | xxx145    | 750      |
| 111111111 | xxx112    | xxx145    | 250      |

I've trying some links, but I've no clue how to solve it, because I've been disconnected for a year from computer (COVID)

Links:

Calculating distance between two GPS locations in a data frame using distm () in R

Function to calculate geospatial distance between two points (lat,long) using R

Edit: Adding data

Sorry for my first writing. I was completely blocked and I was unable to ask in a proper way.

I've combine two dataframes (leftjoin) up to obtain the AAA_JOIN dataframe with "Doc_titular"

Now my problems starts:
1.- Filter from "Doc_titular", and get al rows from a same "Doc_titular"
2.- Calculate distance between all LAT/LONG 3.- Store data in a CSV for each "Doc_titular" with all HouseID distances in mentioned format (Doc_titular; HouseId; HouseId(n); Meters)

Here's an example of the data:

Doc_titular House_ID    longitude   latitude
26DF5756F   AAA/BA/00145    -3.36715925514947   3.80089929185657
26DF5756F   AAA/BA/00146    -3.36687508416913   3.80092746460019
26DF5756F   AAA/BA/00733    -3.37604382639631   3.80126114282085
45GH7765B   AAA/BA/00123    -3.36887798896237   3.80405033823961
45GH7765B   AAA/BA/00498    -3.37077717656959   3.80121749925945
45GH7765B   AAA/BA/00998    -3.79037050320006   3.77633839304628
45GH7765B   AAA/BA/00332    -3.38064351196704   3.80099089206718
98TR2794P   AAA/BA/00420    -3.36824907065489   3.80086791973886
98TR2794P   AAA/BA/00557    -3.37255900917349   3.80107792023686
98TR2794P   AAA/BA/00556    -3.36674589155523   3.8012204114931
98TR2794P   AAA/BA/00040    -4.05181620512371   3.80137173136896

Sorry if i'm posting someting basic, but I'm not very good at R and I've been far far away for computer on this year. Thanks in advance.

Jammoyano
  • 1
  • 2
  • 2
    It would be helpful if you could provide us with a reproducible [minimal working example](https://en.wikipedia.org/wiki/Minimal_working_example) that we can copy and paste to better understand the issue and test possible solutions. You can share datasets with `dput(YOUR_DATASET)` or smaller samples with `dput(head(YOUR_DATASET))`. (See [this answer](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#5963610) for some great advice.) – ktiu Jun 24 '21 at 17:08

1 Answers1

0

Here is my solution with purrr and geosphere::distm():

library(purrr)

split(your_data, ~ Doc_titular) %>%
  map(~ pmap(.x, list)) %>%
  map_dfr(~ combn(.x, 2, simplify = F) %>% map_dfr(~ do.call(\(h1, h2) {
    c(House_ID_1 = h1$House_ID,
      House_ID_2 = h2$House_ID,
      Distance = geosphere::distm(c(h1$longitude, h1$latitude),
                                  c(h2$longitude, h2$latitude)))
  }, .x)))

Returns:

# A tibble: 15 x 3
   House_ID_1   House_ID_2   Distance        
   <chr>        <chr>        <chr>           
 1 AAA/BA/00145 AAA/BA/00146 31.7180146462883
 2 AAA/BA/00145 AAA/BA/00733 987.675672076512
 3 AAA/BA/00146 AAA/BA/00733 1019.09764067029
 4 AAA/BA/00123 AAA/BA/00498 377.66269306226 
 5 AAA/BA/00123 AAA/BA/00998 46918.5688963262
 6 AAA/BA/00123 AAA/BA/00332 1349.94747698525
 7 AAA/BA/00498 AAA/BA/00998 46688.6347143115
 8 AAA/BA/00498 AAA/BA/00332 1096.20190700286
 9 AAA/BA/00998 AAA/BA/00332 45593.1545475677
10 AAA/BA/00420 AAA/BA/00557 479.294723110493
11 AAA/BA/00420 AAA/BA/00556 171.456842525277
12 AAA/BA/00420 AAA/BA/00040 75928.0863123282
13 AAA/BA/00557 AAA/BA/00556 645.89145220337 
14 AAA/BA/00557 AAA/BA/00040 75449.3320106569
15 AAA/BA/00556 AAA/BA/00040 76095.0197361047

(Data used:)

your_data <- structure(list(Doc_titular = c("26DF5756F", "26DF5756F", "26DF5756F", "45GH7765B", "45GH7765B", "45GH7765B", "45GH7765B", "98TR2794P", "98TR2794P", "98TR2794P", "98TR2794P"), House_ID = c("AAA/BA/00145", "AAA/BA/00146", "AAA/BA/00733", "AAA/BA/00123", "AAA/BA/00498", "AAA/BA/00998", "AAA/BA/00332", "AAA/BA/00420", "AAA/BA/00557", "AAA/BA/00556", "AAA/BA/00040"), longitude = c(-3.36715925514947, -3.36687508416913, -3.37604382639631, -3.36887798896237, -3.37077717656959, -3.79037050320006, -3.38064351196704, -3.36824907065489, -3.37255900917349, -3.36674589155523, -4.05181620512371), latitude = c(3.80089929185657, 3.80092746460019, 3.80126114282085, 3.80405033823961, 3.80121749925945, 3.77633839304628, 3.80099089206718, 3.80086791973886, 3.80107792023686, 3.8012204114931, 3.80137173136896)), row.names = c(NA, -11L), class = c("tbl_df", "tbl", "data.frame"))
ktiu
  • 2,606
  • 6
  • 20
  • Thanks @ktiu
    I've tried your code and I get this message `Error: unexpected input in: " map_dfr(~ combn( 2, simplify = FALSE) %>% map_dfr(~ do.call(\"` I've never used purrr, so I've no clue about the message (even I've tried your full example loading data) Thanks in advance
    – Jammoyano Jun 27 '21 at 20:50
  • You're getting this error because you're running an old version of R. In my code, I used the [anonymous function notation](https://www.r-bloggers.com/2021/05/new-features-in-r-4-1-0/#shorthand-syntax-for-anonymous-functions) introduced in R v4.1. You can solve the issue by upgrading to the newest version of R, or by substituting the backslash notation `\()` by `function()`. – ktiu Jun 28 '21 at 08:54
  • Thanks @ktiu. I'm using R v4.0. Changing `function()` solved this message Now I get `Error in unique.default(x, nmax = nmax) : unique() only used with vectors` Any idea? (My meesage is not in English) Thanks in advance (again). – Jammoyano Jun 28 '21 at 13:09
  • Since you didn't provide reproducible data I had to make assumptions about your data types. This is why it is a good idea to include sample data in your question (with `dput()`) – ktiu Jun 28 '21 at 14:07
  • Sorry @ktiu, but data are really sensitive to post on internet. – Jammoyano Jun 28 '21 at 16:07
  • Error solved. Instead `~ Doc_Titular` in `split()` I've created a list with values of Doc_Titular with `select()` .... Now I'm having this error with `geosphere::distm()` : `Error in .pointsToMatrix(x) : longitude < -360 `. So now let's read about geosphere and EPSG:4326. Thanks in advance !! – Jammoyano Jul 04 '21 at 14:38