1

I have a data frame with positions where a bus has had the speed zero (standing still). I want to determine if this is due to traffic conditions or because it has stopped at a bus stop. I have a function that calculates the distance from the center of a bus stop to any other position (the function is called in_circle). If the bus has stopped within 20 meters from the center of the bus stop, I set stop_type to 1 and move on to the next point at which a bus has stopped.

The code below is working, but I have a big amount of data and the two for-loops take quite a while to run. Therefore, I wonder if there is a more effective way to write the code below.

Edit: I added a picture of some rows of the data. https://i.stack.imgur.com/Ekvqv.png

k=1
for(i in 1:NROW(df_bus_h_z)){
  # Save current longitude and latitude of the bus
  cur_lat <- df_bus_h_z[i, "latitude"]
  cur_lon <- df_bus_h_z[i, "longitude"]
  # Controll boolean
  stop_found = FALSE 
  #Search trough all bus stops
  for(j in 1:NROW(df_stop_all)){
    if(df_stop_all[j,"trip_id"] == cur_trip){
      # If the bus stopped at a bus stop
      if(in_circle(df_stop_all[j,"stop_lat"],df_stop_all[j,"stop_lon"], cur_lat, cur_lon) <= 20){
        df_bus_h_z[i, "stop_type"] <- 1
        df_bus_h_z[i, "stop_id"] <- df_stop_all[j,"stop_id"]
        stop_found = TRUE
        break
      }
    }
  }
  if(stop_found == FALSE){
    df_bus_h_z$stop_type[i] <- 0
  }
}
Lasarus9
  • 83
  • 9
Charlotte
  • 23
  • 5
  • There are definitely more efficient ways using vectorized functions. Could you provide an example of your data? – csgroen Feb 08 '19 at 15:15
  • Look into `fuzzyjoin::geo_join`. That should be much faster, and lets you define the max distance from any of the bus stop locations. http://varianceexplained.org/fuzzyjoin/reference/geo_join.html – Jon Spring Feb 08 '19 at 15:18
  • @csgroen, I added a link to a picture. – Charlotte Feb 08 '19 at 15:23
  • @JonSpring, thank you, I will look in to that! – Charlotte Feb 08 '19 at 15:24
  • [You should not post code/data as an image because...](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557) – Parfait Feb 08 '19 at 15:38

1 Answers1

0

Consider merging the two datasets which is often the vectorized counterpart of nested for loops between two data sets. Then, run an ifelse which depends on the return of unknown method, in_circle.

Please note below is untested without a reproducible example and desired results. If needing to keep all records in either set, adjust the all argument in ?merge. Also, adjust join fields as needed in by or by.x/by.y arguments.

mdf <- merge(df_bus_h_z, df_stop_all, by="trip_id")      # MAYBE SUBSET BY UNKNOWN cur_trip?

mdf <- within(mdf, {
           cond <- in_circle(stop_lat, stop_lon, latitude, longitude) <= 20

           stop_type <- ifelse(cond, 1, 0)               # NEW COLUMN
           bus_stop_id <- ifelse(cond, stop_id, NA)      # NEW COLUMN (POSSIBLY REDUNDANT)
       })
Parfait
  • 104,375
  • 17
  • 94
  • 125