2

Assuming below data

OriginId, OriginName, DestinationId, DestinationName,Time
1        ,   Origin 1,   1       ,          Destination 1 , 20
1        ,   Origin 1,   2       ,          Destination 2 , 25
2        ,   Origin 2,   3       ,          Destination 3 , 14
2        ,   Origin 2,   4       ,          Destination 4 , 29

This is a CSV which holds travel time between origins and destinations. I want to find the closest destination per origin. In other words, I have to group the data by OriginId and give rank to each group based on the time and get those rows which got rank 1. So the desired result for above data is:

OriginId, OriginName, DestinationId, DestinationName,Time(Minute)
1        ,   Origin 1,   1       ,          Destination 1 , 20
2        ,   Origin 2,   3       ,          Destination 3 , 14

Which R function do I need to use after group by?

Shahin
  • 12,543
  • 39
  • 127
  • 205

1 Answers1

4

Using dplyr, can group by 'OriginId' and then get the row index of 'Time' that has the minimum 'Time' with which.min, extract that row with slice.

library(dplyr)
df1 %>%
  group_by(OriginId) %>%
  slice(which.min(Time))

Or, if we consider to use data.table, convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'OriginId', we get the row index (as in the previous case) and subset the rows of the dataset (.SD).

library(data.table)
setDT(df1)[, .SD[which.min(Time)], by = OriginId]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    Thanks, would that be possible to add a bit details to your answer? – Shahin Dec 09 '15 at 12:37
  • 1
    @shaahin Sorry, there was some problem with javascript on my browser which prevented me to update with description. Now, it is updated. – akrun Dec 09 '15 at 13:59