r select multiple rows for variables based on maximum and minimum values of one column

Question

I have a data frame like below;

I would like to select maximum and minimum 'probability' value in the 'years' 2017. And, whichever topic has the maximum and minimum probability value, all instances of those topics must be gathered in another data frame like below;

(in the above example topic V16 has the minimum probability of all in 2017 and V30 has the maximum probability)

Please add [a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — pogibas, Jan 30 '18 at 07:24

akrun · Accepted Answer · 2018-01-30T08:13:45.407

We can use tidyverse. If we need to get the rows of 'topic' where the 'probability' is max/min only for 'years' 2017, then

library(dplyr)
df1 %>%
    filter(topics %in% topics[probability == max(probability) & years == 2017]| 
           topics %in% topics[probability == min(probability) & years == 2017])
# A tibble: 4 x 3
# Groups: years [2]
#   years topics probability
#   <int> <chr>        <dbl>
#1  2016 V10         0.0553
#2  2016 V15         0.0164
#3  2017 V30         0.0714
#4  2017 V16         0.0130

Or use slice

df1 %>%
   slice(c(which(topics %in% topics[probability == max(probability) & years == 2017]),
       which(topics %in% topics[probability == min(probability) & years == 2017])))
# A tibble: 4 x 3
#   years topics probability
#   <int> <chr>        <dbl>
#1  2016 V30         0.0219
#2  2017 V30         0.0714
#3  2016 V16         0.0300
#4  2017 V16         0.0130

Or using base R

subset(df1, topics %in% subset(df1, years == 2017 & 
            probability %in% range(probability), select = "topics")[[1]])

data

df1 <- structure(list(years = c(2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L
), topics = c("V1", "V10", "V11", "V12", "V13", "V14", "V15", 
"V16", "V17", "V18", "V19", "V2", "V20", "V21", "V22", "V23", 
"V24", "V25", "V26", "V27", "V28", "V29", "V3", "V30", "V4", 
"V5", "V6", "V7", "V8", "V9", "V1", "V10", "V11", "V12", "V13", 
"V14", "V15", "V16", "V17", "V18", "V19", "V2", "V20", "V21", 
"V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", "V3", 
"V30", "V4", "V5", "V6", "V7", "V8", "V9"), probability = c(0.045, 
0.0553, 0.03038, 0.0454189, 0.0347, 0.0278, 0.0164, 0.030016, 
0.0205, 0.0212, 0.0434, 0.0506, 0.0376, 0.019, 0.04, 0.033, 0.019, 
0.0499, 0.0204, 0.049, 0.02044, 0.03, 0.0207, 0.0219, 0.035, 
0.019, 0.044, 0.037, 0.0327, 0.046, 0.021, 0.03015, 0.028, 0.0299, 
0.015, 0.0439, 0.0378, 0.013, 0.0241, 0.0454, 0.0226, 0.0207, 
0.0258, 0.0237, 0.063, 0.027, 0.018, 0.058, 0.0255, 0.0172, 0.0576, 
0.0706, 0.035, 0.0714, 0.0266, 0.0228, 0.0183, 0.0265, 0.0376, 
0.0409)), .Names = c("years", "topics", "probability"), 
 class = "data.frame", row.names = c(NA, 
-60L))

Onyambu · Answer 2 · 2018-01-30T08:06:06.910

0

You can try

library(data.table)
a=setDT(df)[years==2017,topics[c(which.min(probability),which.max(probability))],by=years]
subset(df,topics%in%a$V1)

in base r, you can do something like:

a=aggregate(probability~years,subset(df,years==2017),function(x)c(which.max(x),which.min(x)))
subset(df,topics%in%topics[c(a$probability)])

edited Jan 30 '18 at 08:06

answered Jan 30 '18 at 07:57

Onyambu

67,392
3
24
53

r select multiple rows for variables based on maximum and minimum values of one column

2 Answers2

data