Error in zip_distance: arguments imply differing number of rows

Question

Error in data.frame(zipcode_a, zipcode_b, distance) : arguments imply differing number of rows: 32019, 400.

I am attempting to calculate the distance between 32019 Zip Codes and one location (Zipcode 94063). I have an excel sheet with two columns. Both columns are vectorized and have lengths of 32019 but I keep getting an error that the arguments imply differing number of rows when running zip_distance(Zip_Codes_Only$zip_a, zip_Codes_Only$zip_b, units = "miles").

Does anyone have a fix for this or an alternative method to calculate distance between zip codes? I have tried mapdist but also run into similar issues.

Imported Excel Zip Code Data

It is possible that zip_distance does know how to hand the zip+4 convention. So that could be the problem. What does `head(zip_Codes_Only, 10)` show? — Dave2e, Dec 22 '21 at 23:17
Hello @Dave2e thank you for the comment! It looks like the zip_distance function was only calculating the distance for unique zip codes instead of every row. I need a way to run it so that it does it for each line and the data frame spits out one output per line. I think a mutate or loop could do this but I have no experience with either. Do you have any thoughts? — hoodwench, Dec 24 '21 at 00:12

score 0 · Answer 1 · answered Dec 24 '21 at 01:03

Wow, this package has a some undocumented features which conflict with the help pages. These functions are not vectorized and thus one is required to loop through the dataset line-by-line to obtain the correct answer.

#Sample data
zip_a <- c("94063") 
zip_b <- c("94306", "94301", "94030", "94030",  "95409")
Zip_Codes_Only <- data.frame(zip_a, zip_b)


library(zipcodeR)
#loop through each row of the dataset and return the calculated distance
distance <- sapply( 1:nrow(Zip_Codes_Only), function(i) {
   temp <- zip_distance(Zip_Codes_Only$zip_a[i], Zip_Codes_Only$zip_b[i], units = "miles")
   temp$distance
})
#merge distance to original dataset
cbind(Zip_Codes_Only, distance)

Example of the logic error generated by the trying to process a vector with the zip_distance() function. As per the help page:

zip_distance(zipcode_a, zipcode_b, lonlat = TRUE, units = "miles")
Arguments:
zipcode_a First vector of ZIP codes
zipcode_b Second vector of ZIP codes

zip_c <- c("94306", "94301")
zip_distance(zip_a, zip_c, units = "miles")
# zipcode_a zipcode_b distance
# 1     94063     94306     5.29
# 2     94063     94301     7.61

zip_d <- c("94301", "94306")
zip_distance(zip_a, zip_d, units = "miles")
# zipcode_a zipcode_b distance
# 1     94063     94301     5.29
# 2     94063     94306     7.61

Notice zip_d is a reverse ordering of zip_c but the distance vector in the resulting data frame is the same.

Thank you so much Dave2e! I'm attempting to expand your solution to the fulldata set (over 40,000 zip codes) and am encountering the following error: Error in p[, 3:4, drop = FALSE] : subscript out of bounds. Any ideas why this could be happening? Appreciate all your time and help with this. — hoodwench, Dec 24 '21 at 02:47
It could be caused by an invalid (or not in the database) zip code in the second column. — Dave2e, Dec 24 '21 at 02:58

Error in zip_distance: arguments imply differing number of rows

1 Answers1