2

I have dataframe like below :-

x<-c(3,2,1,8,7,11,10,9,7,5,4)
y<-c("a","a","a", "b","b","c","c","c","c","c","c")
z<-c(2,2,2,1,1,3,3,3,3,3,3)
df<-data.frame(x,y,z)

df
    x y z
1   3 a 2
2   2 a 2
3   1 a 2
4   8 b 1
5   7 b 1
6  11 c 3
7  10 c 3
8   9 c 3
9   7 c 3
10  5 c 3
11  4 c 3

I want to select top n row for each group by column y where n is provided in column z. So the output should be like :

output:
       x   y  z
     1 3   a  2
     2 2   a  2
     3 8   b  1
     4 11  c  3
     5 10  c  3
     6 9   c  3
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
vsb
  • 428
  • 6
  • 15
  • Would `z` values be always same for a group? What if they are different? How to select `n`? – Ronak Shah Jul 10 '17 at 08:08
  • z values are always same for a group.So n value for group "a" is 2 ,for "b" is 1 and for "c" is 3. – vsb Jul 10 '17 at 08:10
  • 3
    `library(dplyr); df %>% group_by(y) %>% slice(1:z[1])` should work. – HNSKD Jul 10 '17 at 08:13
  • @HNSKD your code works.. but in case two values in column x is similar for a group then ? – vsb Jul 10 '17 at 08:23
  • in that case you can add `unique()` before `slice()` – Prem Jul 10 '17 at 08:29
  • 1
    thanks @Prem and@HNSKD df %>% group_by(y) %>% unique()%>% slice(1:z[1]) is what i was looking for. – vsb Jul 10 '17 at 08:49
  • @vsb You've received a few good answers below. If one of them worked for you, please consider accepting it by clicking on the check mark to the left of the answer. This lets the community know the answer solved your issue and that the issue should be closed. – CPak Sep 09 '17 at 02:03

4 Answers4

3

A solution with base R:

# df is split according to y, then we keep only the top "z" value (after ordering x) 
# and rbind everything back together:
do.call(rbind, 
        lapply(split(df, df$y), 
               function(df1) df1[order(df1$x, decreasing=TRUE), ][1:unique(df1$z), ]))
#     x y z
#a.1  3 a 2
#a.2  2 a 2
#b    8 b 1
#c.6 11 c 3
#c.7 10 c 3
#c.8  9 c 3

EDIT:
A much more direct way (still in base R) provided in comment by @mt1022:

df[ave(1:nrow(df), df$y, FUN = seq_along) <= df$z, ]
#   x y z
#1  3 a 2
#2  2 a 2
#4  8 b 1
#6 11 c 3
#7 10 c 3
#8  9 c 3
Cath
  • 23,906
  • 5
  • 52
  • 86
  • 2
    Another base R solution: `df[ave(1:nrow(df), df$y, FUN = seq_along) <= as.numeric(df$z), ]` – mt1022 Jul 10 '17 at 08:31
  • @mt1022 I should get more familiar with `ave` ;-), this is much better base R solution, you should post that – Cath Jul 10 '17 at 08:33
  • It is adapted from this one: https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame. I just add a filter step. Maybe you can add it to your answer as an alternative way. – mt1022 Jul 10 '17 at 08:35
  • @mt1022 thanks, I'll add it but feel free to change your mind and post and I'll delete the edit ;-) – Cath Jul 10 '17 at 08:38
  • @mt1022 but this is assuming the data is _descendingly_ sorted. – Ronak Shah Jul 10 '17 at 08:46
  • 1
    @RonakShah, sure. If the data are not already in right order, an additional `order(-as.numeric(df$x))` is required. – mt1022 Jul 10 '17 at 08:49
1

One approach with data.table:

library(data.table)
setDT(df)
df[,.(inc=seq_len(.N)<=z,x,z),by=.(y)][inc==T ,-2]
#   y  x z
#1: a  3 2
#2: a  2 2
#3: b  8 1
#4: c 11 3
#5: c 10 3
#6: c  9 3
Cath
  • 23,906
  • 5
  • 52
  • 86
Erdem Akkas
  • 2,062
  • 10
  • 15
0

A solution with dplyr that uses do:

df %>%
   group_by(y) %>%
   do(head(.,as.numeric(unique(.$z))))
CPak
  • 13,260
  • 3
  • 30
  • 48
0

I'm posting the solution I was looking for using dplyr. It is based on @HNSKD:

library(dplyr)
x<-c(3,2,1,8,7,11,10,9,7,5,4)
y<-c("a","a","a", "b","b","c","c","c","c","c","c")
z<-c(2,2,2,1,1,3,3,3,3,3,3)

df<-data.frame(x,y,z)

df %>% group_by(y) %>% slice(1:2)

Which returns the first two elements for each y:

# A tibble: 6 x 3
# Groups:   y [3]
      x y         z
  <dbl> <fct> <dbl>
1     3 a         2
2     2 a         2
3     8 b         1
4     7 b         1
5    11 c         3
6    10 c         3
Pablo Casas
  • 868
  • 13
  • 15