-2

I have a data frame like below;

enter image description here

I would like to select maximum and minimum 'probability' value in the 'years' 2017. And, whichever topic has the maximum and minimum probability value, all instances of those topics must be gathered in another data frame like below;

enter image description here

(in the above example topic V16 has the minimum probability of all in 2017 and V30 has the maximum probability)

taurian
  • 65
  • 1
  • 6
  • 3
    Please add [a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – pogibas Jan 30 '18 at 07:24

2 Answers2

0

We can use tidyverse. If we need to get the rows of 'topic' where the 'probability' is max/min only for 'years' 2017, then

library(dplyr)
df1 %>%
    filter(topics %in% topics[probability == max(probability) & years == 2017]| 
           topics %in% topics[probability == min(probability) & years == 2017])
# A tibble: 4 x 3
# Groups: years [2]
#   years topics probability
#   <int> <chr>        <dbl>
#1  2016 V10         0.0553
#2  2016 V15         0.0164
#3  2017 V30         0.0714
#4  2017 V16         0.0130

Or use slice

df1 %>%
   slice(c(which(topics %in% topics[probability == max(probability) & years == 2017]),
       which(topics %in% topics[probability == min(probability) & years == 2017])))
# A tibble: 4 x 3
#   years topics probability
#   <int> <chr>        <dbl>
#1  2016 V30         0.0219
#2  2017 V30         0.0714
#3  2016 V16         0.0300
#4  2017 V16         0.0130

Or using base R

subset(df1, topics %in% subset(df1, years == 2017 & 
            probability %in% range(probability), select = "topics")[[1]])

data

df1 <- structure(list(years = c(2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L
), topics = c("V1", "V10", "V11", "V12", "V13", "V14", "V15", 
"V16", "V17", "V18", "V19", "V2", "V20", "V21", "V22", "V23", 
"V24", "V25", "V26", "V27", "V28", "V29", "V3", "V30", "V4", 
"V5", "V6", "V7", "V8", "V9", "V1", "V10", "V11", "V12", "V13", 
"V14", "V15", "V16", "V17", "V18", "V19", "V2", "V20", "V21", 
"V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", "V3", 
"V30", "V4", "V5", "V6", "V7", "V8", "V9"), probability = c(0.045, 
0.0553, 0.03038, 0.0454189, 0.0347, 0.0278, 0.0164, 0.030016, 
0.0205, 0.0212, 0.0434, 0.0506, 0.0376, 0.019, 0.04, 0.033, 0.019, 
0.0499, 0.0204, 0.049, 0.02044, 0.03, 0.0207, 0.0219, 0.035, 
0.019, 0.044, 0.037, 0.0327, 0.046, 0.021, 0.03015, 0.028, 0.0299, 
0.015, 0.0439, 0.0378, 0.013, 0.0241, 0.0454, 0.0226, 0.0207, 
0.0258, 0.0237, 0.063, 0.027, 0.018, 0.058, 0.0255, 0.0172, 0.0576, 
0.0706, 0.035, 0.0714, 0.0266, 0.0228, 0.0183, 0.0265, 0.0376, 
0.0409)), .Names = c("years", "topics", "probability"), 
 class = "data.frame", row.names = c(NA, 
-60L))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You can try

library(data.table)
a=setDT(df)[years==2017,topics[c(which.min(probability),which.max(probability))],by=years]
subset(df,topics%in%a$V1)

in base r, you can do something like:

a=aggregate(probability~years,subset(df,years==2017),function(x)c(which.max(x),which.min(x)))
subset(df,topics%in%topics[c(a$probability)])
Onyambu
  • 67,392
  • 3
  • 24
  • 53