2

I am trying to get corresponding latitudes and longitudes for a particular pincode for India.

For pincode I have the following file.

https://data.gov.in/sites/default/files/all_india_PO_list_without_APS_offices_ver2_lat_long.csv

The data has 15 columns. I just show a part of it so that you can see how this data looks like.

                   officename pincode officeType Deliverystatus divisionname   regionname
 1:             Achalapur B.O  504273        B.O       Delivery     Adilabad    Hyderabad
 2:                   Ada B.O  504293        B.O       Delivery     Adilabad    Hyderabad
 3:               Adegaon B.O  504307        B.O       Delivery     Adilabad    Hyderabad
 4: Adilabad Collectorate S.O  504001        S.O   Non-Delivery     Adilabad    Hyderabad
 5:              Adilabad H.O  504001        H.O       Delivery     Adilabad    Hyderabad

This file has multiple lat-long mapped to one pincode.

For my use, I need to have one lat-long for a particular pincode (I have two address X and Y) and then I use Haversine to calculate the distance between X and Y

Possible options for me

  1. Take an average of lat-long for pincodes, then map them. Calculate Haversine distance between X and Y.
  2. Tried to use geocode.

I am getting this error mainly because I am behind office firewall

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [maps.googleapis.com] Connection timed out after 10000 milliseconds
  1. Any other source on net or any other way to get 1:1 mapping between pincode and lat-long

Any help is appreciated!

jazzurro
  • 23,179
  • 35
  • 66
  • 76
Ravi
  • 612
  • 1
  • 6
  • 17
  • I have a hard time to understand what you are trying to achieve. As far as I see, pincode 207001 has 61 data points. There are some pincodes that have only one data point. One more thing, I do not see any values in longitude and latitude in the data. If this is the case, how can we get averaged long and lat for each pincode? Could you explain more? – jazzurro Jan 06 '20 at 13:06
  • The values are present for some states if you filter it. So basically my ask is I have a dataset where I have two address (or two pin codes, haven't put the sample here). I have to calculate the distance between them. I thought the best approach would be to take lat and long of those pincodes and then put haversine distance formula on it. Does the question makes sense now? – Ravi Jan 06 '20 at 16:58
  • If you are asking whether average lat long is applicable here, that should depend on the specific use case you have in mind. What do you want to do with the distances and what is your tolerance for accuracy? – Arani Oct 06 '20 at 11:45

2 Answers2

1

Here is what I tried for you. Your data is called mydf here. First, get rows that have values in longitude and latitude. For each group that is defined by statename and pincode, find average values for longitude and latitude. This creates out.

library(dplyr)
library(tidyr)
library(purrr)

filter(mydf, complete.cases(latitude) & complete.cases(longitude)) %>% 
group_by(statename, pincode) %>% 
summarize(ave_long = mean(longitude),
          ave_lat = mean(latitude)) -> foo

Next step was to arrange foo in a way that we can calculate Haversine distance. I found a nice way to arrange this data. See the link below. We are creating all possible combinations of the data points here.

# Arrange this data in a way that we can calculate Haversine.
# We basically create all possible combinations of rows.
# This post gave me a hand: https://community.rstudio.com/t/create-all-possible-combinations-of-a-data-frame/26848/4

myrows <- foo %>%
          group_by_all() %>%
          group_split()

out <- t(combn(x = 1:nrow(foo), m = 2)) %>%
       as_tibble() %>%
       mutate_all(~ map(., ~ pluck(myrows, .x))) %>% 
       unnest() %>% 
       setNames(nm = c("start_state", "start_pincode",
                       "start_long", "start_lat",
                       "dest_state", "dest_pincode",
                       "dest_long", "dest_lat"))

We can use distHaversine() or distGeo(). But let's try something new. SymbolixAU wrote another function. Thank you, SymbolixAU!

# https://stackoverflow.com/questions/36817423/how-to-efficiently-calculate-distance-between-pair-of-coordinates-using-data-tab/42014364#42014364

dt.haversine <- function(lat_from, lon_from, lat_to, lon_to, r = 6378137){
                    radians <- pi/180
                    lat_to <- lat_to * radians
                    lat_from <- lat_from * radians
                    lon_to <- lon_to * radians
                    lon_from <- lon_from * radians
                    dLat <- (lat_to - lat_from)
                    dLon <- (lon_to - lon_from)
                    a <- (sin(dLat/2)^2) + (cos(lat_from) * cos(lat_to)) * (sin(dLon/2)^2)
                    return(2 * atan2(sqrt(a), sqrt(1 - a)) * r)
                  }

The final step is to calculate distances.

mutate(out,
       distance = dt.haversine(lon_from = start_long, lat_from = start_lat,
                               lon_to = dest_long, lat_to = dest_lat)) -> result

 # A tibble: 6,105 x 9
#   start_state start_pincode start_long start_lat dest_state dest_pincode dest_long dest_lat distance
#   <chr>               <int>      <dbl>     <dbl> <chr>             <int>     <dbl>    <dbl>    <dbl>
# 1 KARNATAKA          560001       77.6      13.0 KARNATAKA        560003      77.6     13.0    3544.
# 2 KARNATAKA          560001       77.6      13.0 KARNATAKA        560004      77.6     12.9    4554.
# 3 KARNATAKA          560001       77.6      13.0 KARNATAKA        560005      77.6     13.0    3178.
# 4 KARNATAKA          560001       77.6      13.0 KARNATAKA        560008      77.6     13.0    4844.
# 5 KARNATAKA          560001       77.6      13.0 KARNATAKA        560010      77.6     13.0    4618.
# 6 KARNATAKA          560001       77.6      13.0 KARNATAKA        560011      77.6     12.9    5510.
# 7 KARNATAKA          560001       77.6      13.0 KARNATAKA        560013      77.6     13.1    9491.
# 8 KARNATAKA          560001       77.6      13.0 KARNATAKA        560014      77.5     13.1   12047.
# 9 KARNATAKA          560001       77.6      13.0 KARNATAKA        560017      77.7     13.0    6831.
#10 KARNATAKA          560001       77.6      13.0 KARNATAKA        560021      77.6     13.0    5148.
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • Thanks! I wanted to know if taking an average of latitudes and longitudes makes sense or is there any other aggregation method? – Ravi Jan 07 '20 at 04:27
  • @Ravi I am not an expert in this area. So I have no clue. Based on your research question, you want to decide if this is the right approach or not, I think. – jazzurro Jan 07 '20 at 04:28
  • For my use, I am relying on the accuracy of the distance calculated. A distance difference of ~2-5 km is a huge deal. When compared with google maps there is a lot of variation in the distance, thought averaging would solve the problem but apparently not – Ravi Jan 07 '20 at 04:36
  • @Ravi The only thing I can think of is to use another function to calculate distance. Otherwise I have no clue, I’m afraid. – jazzurro Jan 07 '20 at 04:47
  • I problem is I am not getting accurate lat and long for a pincode. If I got that, haversine distance works perfectly. – Ravi Jan 07 '20 at 04:52
  • @Ravi I am afraid I do not knot what you are really trying to achieve. In addition, you are comparing this result with what googlemap returns. That implies that you know the answer you want. I do knot know how googlemap calculates distances. Not sure if any results are comparable unless you know how google calculates distances. It seems to me that I cannot help you from this point. – jazzurro Jan 07 '20 at 04:56
-1

Lat/Long based distances will never match with Google distances, since the latter calculates the path distance, whereas any mathematical formula between lat/long values will be a straight line (as the bird flies).