select the row by comparing values in one column in R

Question

I have a data frame which looks like :

n4= 
    sector turb    dist
    1  sector1  T02  828.66
    2  sector1  T04 1114.58
    3  sector1  T05 1012.22
    4  sector2  T03  992.64
    5  sector2  T05 1012.22
    6  sector2  T06 1158.38
    7  sector3  T03  992.64
    8 sector12  T02  828.66
    9 sector12  T04 1114.58

I would like to keep the rows with unique sector name and the measure is keeping the one with minimum values in dist column :

 sector turb   dist
1  sector1  T02 828.66
4  sector2  T03 992.64
7  sector3  T03 992.64
8 sector12  T02 828.66

I know that I have to group them base on sector :

result = n4 %>%
dplyr::group_by(sector)

But then using select or filter command did not worked as I tried :

result = n4 %>%
    dplyr::group_by(sector)%>%
    dplyr::select(which.min(dist))

Any idea how could I do it ?

Jilber Urbina · Accepted Answer · 2018-09-28T15:54:06.697

You can use filter instead of select and as alternative to slice

> n4 %>%
    dplyr::group_by(sector)%>%
    dplyr::filter(dist==min(dist))
# A tibble: 4 x 3
# Groups:   sector [4]
  sector   turb   dist
  <fct>    <fct> <dbl>
1 sector1  T02    829.
2 sector2  T03    993.
3 sector3  T03    993.
4 sector12 T02    829.

If you prefer using R base, try with aggregate

> aggregate(.~sector, data=n4, min)
    sector turb   dist
1  sector1    1 828.66
2 sector12    1 828.66
3  sector2    2 992.64
4  sector3    2 992.64

You can check this answer for further alternatives to accomplish this task.

akrun · Answer 2 · 2018-09-28T15:49:03.567

We need slice instead of select to subset the rows. The select function is to select the columns of the dataset. If the order of 'sector' should be the same as in the order of occurrence of 'sector' in the input data, then change the column to a factor with levels specified in the input data order

n4 %>%       
   dplyr::group_by(sector = factor(sector, levels = unique(sector)))%>%
   dplyr::slice(which.min(dist))
# A tibble: 4 x 3
# Groups:   sector [4]
#  sector   turb   dist
#  <fct>    <chr> <dbl>
#1 sector1  T02    829.
#2 sector2  T03    993.
#3 sector3  T03    993.
#4 sector12 T02    829.

Or using base R

n4[with(n4, ave(dist, sector, FUN = min) == dist),]
#     sector turb   dist
#1  sector1  T02 828.66
#4  sector2  T03 992.64
#7  sector3  T03 992.64
#8 sector12  T02 828.66

data

n4 <- structure(list(sector = c("sector1", "sector1", "sector1", "sector2", 
"sector2", "sector2", "sector3", "sector12", "sector12"), turb = c("T02", 
"T04", "T05", "T03", "T05", "T06", "T03", "T02", "T04"), dist = c(828.66, 
 1114.58, 1012.22, 992.64, 1012.22, 1158.38, 992.64, 828.66, 1114.58
)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9"))

thank you for the reply, but here The sorting of the sector column changes to : `sector1, sector12, sector2, sector3`, how could I fix that one ? — Ali Hadjihoseini, Sep 28 '18 at 15:41
@AliHadjihoseini One option is to convert it to a `factor` column with `levels` specified. Updated the post — akrun, Sep 28 '18 at 15:44

select the row by comparing values in one column in R

2 Answers2

data