0

I have an excel sheet with 2 tabs. Each one has a zip code column, latitude column, longitude column, and zip code type column. I am trying to find the next closest zip code to any PO box or University zip codes. I want to use R to find the minimum distance between all zip codes and return the PO/University zip and the next closest real zip code (replacement ZIP). I already tried to have excel calculate the values, but the dataset is too big. I have 29557 normal zip codes by 10019 PO/University zip codes.

So far, I have converted the excel sheet to a csv, and now I have matrices and dataframes and am at a dead end.

Any ideas on how to accomplish this? I can elaborate on my problem if needed. Any help is appreciated, thank you.

Edit:

`>real_zip <-read.csv(file="Real_ZIP_Codes.csv",header=TRUE,row.names=1,sep=",")
>po_zip <- read.csv(file="PO_ZIP_Codes.csv",header=TRUE,row.names=1,sep=",")
>#Defining some variables. Unneeded ones are commented out
>#real_zip_dataframe <- data.frame(abs(real_zip))
>#po_zip_dataframe <- data.frame(po_zip)
>real_zip_matrix <- as.matrix(real_zip)
>po_zip_matrix <- as.matrix(po_zip)
>real_zip_list <- as.list(real_zip)
>po_zip_list <- as.list(po_zip)
>real_zip_matrix_lat <- as.matrix(abs(real_zip$LAT))
>real_zip_matrix_long <- as.matrix(abs(real_zip$LONG))
>po_zip_matrix_lat <- as.matrix(abs(po_zip$LAT))
>po_zip_matrix_long <- as.matrix(abs(po_zip$LONG))
>LatCalc <-mapply(diff,real_zip_matrix_lat,po_zip_matrix_lat,SIMPLIFY = FALSE)
>LongCalc <-mapply(diff,real_zip_matrix_long,po_zip_matrix_long,SIMPLIFY = FALSE)

>test<-apply(LatCalc,1:2,((real_zip_matrix_lat)-(po_zip_matrix_lat)))
>diff_lat<-real_zip_matrix_lat-po_zip_matrix_lat
>diff_long<-diff(po_zip_matrix_long,real_zip_matrix_long)
`

This is my script as a whole so far.

Some more info:

`> nrow(po_zip)
[1] 10019
> nrow(real_zip)
[1] 29557`

When I run:

 `> diff_long<-diff(real_zip_matrix_long,po_zip_matrix_long)`

I get this error:

> diff_long<-diff(real_zip_matrix_long,po_zip_matrix_long)
Error in diff.default(real_zip_matrix_long, po_zip_matrix_long) : 
  'lag' and 'differences' must be integers >= 1

When I run: > test<-apply(LatCalc,1:2,real_zip_matrix_lat-as.vector(po_zip_matrix_lat))

I get this error:

Error in match.fun(FUN) : 
  'real_zip_matrix_lat - as.vector(po_zip_matrix_lat)' is not a function, character or symbol
In addition: Warning message:
In real_zip_matrix_lat - as.vector(po_zip_matrix_lat) :
  longer object length is not a multiple of shorter object length
  • 2
    Welcome to StackOverflow, please read the following tips on producing a [minimum example](http://stackoverflow.com/help/mcve) and this post on asking providing a [reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Then edit your question accordingly. – lmo May 24 '16 at 13:02
  • I have updated the post, sorry about forgetting to post code. And thank you for the welcome. I can update the post with more info if needed, I'm honestly unsure of what info is needed at this point because I've hit this dead end. I'm not new to programming, but I'm new to R. – Delete_System_32 May 24 '16 at 13:29
  • Maybe the next step is providing some sample data, and also showing a table of your desired output for this data. For example, provide 2 university zips and some sample of other zips, maybe 20 using `dput`. – lmo May 24 '16 at 13:35
  • Read `?diff`. The second argument is typically the number of lags, which might not make sense in your application. Also, take a look at `apply`, the second argument there is the margin, an integer, where 1 is "rows", 2 is "columns" and higher values correspond to multidimensional array margins. It does not accept a vector of length 2. – lmo May 24 '16 at 13:38
  • It may be a better idea to break this problem up into smaller pieces and ask questions corresponding to each piece as you move along. – lmo May 24 '16 at 13:40

0 Answers0