2

I have a dataframe like the one I've simplified below. I want to first select rows with the same value based on column X, then in that selection select rows with the same value based on column Y. Then from that selection, I want to take the minimal value. I'm now using a forloop, but seems there must be an easier way. Thanks!

set.seed(123)    
data<-data.frame(X=rep(letters[1:3], each=8),Y=rep(c(1,2)),Z=sample(1:100, 12))
data
   X Y  Z
1  a 1 76
2  a 1 22
3  a 2 32
4  a 2 23
5  b 1 14
6  b 1 40
7  b 2 39
8  b 2 35
9  c 1 15
10 c 1 13
11 c 2 21
12 c 2 42

Desired outcome:

   X Y  Z
2  a 1 22
4  a 2 23
5  b 1 14
8  b 2 35
10 c 1 13
11 c 2 21
joffie
  • 205
  • 2
  • 12

4 Answers4

4

Here is a data.table solution:

library(data.table)
data = data.table(data)
data[, min(Z), by=c("X", "Y")]

EDIT based on OP's comment:

If there is a NA value in one of the columns we sort by, an additional row is created:

data[2,2] <-NA
data[, min(Z,na.rm = T), by=c("X", "Y")]

   X  Y V1
1: a  1 31
2: a NA 79
3: a  2 14
4: b  1 31
5: b  2 14
6: c  1 50
7: c  2 25
otwtm
  • 1,779
  • 1
  • 16
  • 27
  • Thank. Is there any downside of changing my dataframe to a data table? Anything I should consider... – joffie May 13 '20 at 13:57
  • No downside. A data.table is still a data.frame, meaning it can do all the things a data.frame can do, just with some more features. (Anyone correct me if I'm wrong) – otwtm May 13 '20 at 14:01
  • And do NAs get ignored? Seems like it does :) – joffie May 13 '20 at 14:08
  • Hi, I edited the answer with a NA value. Is that what you meant? – otwtm May 13 '20 at 14:21
  • Sorry, I meant a NA value in column V1. That it takes the minimal value, ignoring that NA. But I think the initial code was correct – joffie May 14 '20 at 14:27
2
library(tidyverse)
data %>%
  group_by(X, Y) %>%
  summarise(Z = min(Z))

Will do the trick! The other answer right now is the data.table way, this is tidyverse. Both are extremely powerful ways to approach data cleaning & manipulation - it could be helpful to familiarize yourself with one!

ila
  • 709
  • 4
  • 15
2

In base you can use aggregate to get min from Z grouped by the remaining columns like:

aggregate(Z~.,data,min)
#  X Y  Z
#1 a 1 31
#2 b 1 31
#3 c 1 50
#4 a 2 14
#5 b 2 14
#6 c 2 25

In case there is an NA in the groups:

data[2,2] <-NA

Ignore it:

aggregate(Z~.,data,min)
#  X Y  Z
#1 a 1 31
#2 b 1 31
#3 c 1 50
#4 a 2 14
#5 b 2 14
#6 c 2 25

Show it:

aggregate(data$Z, list(X=data$X, Y=addNA(data$Y)), min)
#  X    Y  x
#1 a    1 31
#2 b    1 31
#3 c    1 50
#4 a    2 14
#5 b    2 14
#6 c    2 25
#7 a <NA> 79
GKi
  • 37,245
  • 2
  • 26
  • 48
0

This code could benefit from splitting it up in multiple lines, but it works. in Base-R

do.call(rbind,
lapply(unlist(lapply(split(data,data$X), function(x) split(x,x$Y)),recursive=F), function(y) y[y$Z==min(y$Z),])
)

    X Y  Z
a.1 a 1 31
a.2 a 2 14
b.1 b 1 31
b.2 b 2 14
c.1 c 1 50
c.2 c 2 25
Daniel O
  • 4,258
  • 6
  • 20