1

Error in data.frame(zipcode_a, zipcode_b, distance) : arguments imply differing number of rows: 32019, 400.

I am attempting to calculate the distance between 32019 Zip Codes and one location (Zipcode 94063). I have an excel sheet with two columns. Both columns are vectorized and have lengths of 32019 but I keep getting an error that the arguments imply differing number of rows when running zip_distance(Zip_Codes_Only$zip_a, zip_Codes_Only$zip_b, units = "miles").

Does anyone have a fix for this or an alternative method to calculate distance between zip codes? I have tried mapdist but also run into similar issues.

Imported Excel Zip Code Data

Dave2e
  • 22,192
  • 18
  • 42
  • 50
hoodwench
  • 5
  • 1
  • It is possible that zip_distance does know how to hand the zip+4 convention. So that could be the problem. What does `head(zip_Codes_Only, 10)` show? – Dave2e Dec 22 '21 at 23:17
  • Hello @Dave2e thank you for the comment! It looks like the zip_distance function was only calculating the distance for unique zip codes instead of every row. I need a way to run it so that it does it for each line and the data frame spits out one output per line. I think a mutate or loop could do this but I have no experience with either. Do you have any thoughts? – hoodwench Dec 24 '21 at 00:12

1 Answers1

0

Wow, this package has a some undocumented features which conflict with the help pages. These functions are not vectorized and thus one is required to loop through the dataset line-by-line to obtain the correct answer.

#Sample data
zip_a <- c("94063") 
zip_b <- c("94306", "94301", "94030", "94030",  "95409")
Zip_Codes_Only <- data.frame(zip_a, zip_b)


library(zipcodeR)
#loop through each row of the dataset and return the calculated distance
distance <- sapply( 1:nrow(Zip_Codes_Only), function(i) {
   temp <- zip_distance(Zip_Codes_Only$zip_a[i], Zip_Codes_Only$zip_b[i], units = "miles")
   temp$distance
})
#merge distance to original dataset
cbind(Zip_Codes_Only, distance)

Example of the logic error generated by the trying to process a vector with the zip_distance() function. As per the help page:

zip_distance(zipcode_a, zipcode_b, lonlat = TRUE, units = "miles")
Arguments:
zipcode_a First vector of ZIP codes
zipcode_b Second vector of ZIP codes

zip_c <- c("94306", "94301")
zip_distance(zip_a, zip_c, units = "miles")
# zipcode_a zipcode_b distance
# 1     94063     94306     5.29
# 2     94063     94301     7.61

zip_d <- c("94301", "94306")
zip_distance(zip_a, zip_d, units = "miles")
# zipcode_a zipcode_b distance
# 1     94063     94301     5.29
# 2     94063     94306     7.61

Notice zip_d is a reverse ordering of zip_c but the distance vector in the resulting data frame is the same.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • Thank you so much Dave2e! I'm attempting to expand your solution to the fulldata set (over 40,000 zip codes) and am encountering the following error: Error in p[, 3:4, drop = FALSE] : subscript out of bounds. Any ideas why this could be happening? Appreciate all your time and help with this. – hoodwench Dec 24 '21 at 02:47
  • It could be caused by an invalid (or not in the database) zip code in the second column. – Dave2e Dec 24 '21 at 02:58