-1

My (made-up) data:

dat <- structure(list(animal = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), 
    oxygen = c(25L, 24L, 28L, 30L, 25L, 30L, 28L, 27L, 20L, 22L, 
    20L, 27L, 26L, 24L, 26L, 22L, 30L, 25L, 26L, 28L, 27L, 30L, 
    27L, 28L, 28L, 20L, 23L, 29L), time = c(49L, 33L, 2L, 22L, 
    15L, 22L, 49L, 40L, 11L, 2L, 24L, 48L, 32L, 18L, 39L, 46L, 
    6L, 24L, 26L, 40L, 26L, 26L, 1L, 36L, 4L, 17L, 50L, 24L), 
    habitat = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 
    1L, 1L, 2L, 2L), .Label = c("clean", "dirty"), class = "factor")), .Names = c("animal", 
"oxygen", "time", "habitat"), class = "data.frame", row.names = c(NA, 
-28L))

Variable explanations:

animal: There are 4 individual animals (A, B, C, and D) tested for oxygen consumption.

oxygen: Oxygen consumption rate; each animal was measured multiple times.

time: The time (measured in minute) since a machine started to measure oxygen consumption.

condition: ndicates habitat conditions from which an animal was collected; clean or dirty (polluted) habitat.

What I want to test (by a t-test) is where the mean oxygen consumption rates differ between animals from clean and dirty (polluted) habitats. But, I want to restrict my analysis to the lowest one-third of oxygen consumption values for each animal taken between 5 to 48 minutes.

Could anyone please provide me R codes that can subset my data to contain only the lowest one-third of the oxygen consumption rates for each animal AND the rates taken between 5-48 minutes?

I am trying something like this, but the following code does not do what I want (what it does, I think, is that it selects the lowest one-third from ALL data, not the lowest-one third for each animal):

newdat <- subset(dat, oxygen <= quantile(oxygen, 1/3) & time >= 5 & time >=48)
Metrics
  • 15,172
  • 7
  • 54
  • 83
  • Please check this [link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). A good reproducible example will help others to tackle your question lot more easily. – CHP Oct 18 '13 at 02:52

3 Answers3

2

Something like:

library(plyr)
newdat <- ddply(dat, "animal",
      subset,
        oxygen <= quantile(oxygen, 1/3) & time >= 5 & time <=48)

##    animal oxygen time habitat
## 1       A     25   15   clean
## 2       A     20   11   clean
## 3       B     24   33   clean
## 4       B     24   18   clean
## 5       B     20   17   clean
## 6       C     20   24   dirty
## 7       C     26   39   dirty
## 8       C     26   26   dirty
## 9       D     27   40   dirty
## 10      D     27   48   dirty
## 11      D     22   46   dirty
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • After considering a reply from Codoremifa (please see our correspondence below), it looks like your code does not produce what I intended to produce. Moist likely, I mislead you by my unclear question. Anyway, Codoremifa suggested some modifications, and, with the modifications, your code produced what I wanted. – kiyoshi sasaki Oct 20 '13 at 14:21
1

Edited, I misunderstood your question previously

library(data.table)
dat <- data.table(dat)
subsetted <- dat[time < 48 & time > 5 , LowestOneThird := (oxygen <= quantile(oxygen, 1/3)), by = c('animal')][LowestOneThird == TRUE]

Output:

    > subsetted
   animal oxygen time habitat LowestOneThird
1:      A     20   11   clean           TRUE
2:      A     25   15   clean           TRUE
3:      B     20   17   clean           TRUE
4:      B     24   18   clean           TRUE
5:      B     24   33   clean           TRUE
6:      C     20   24   dirty           TRUE
7:      D     27   40   dirty           TRUE
8:      D     22   46   dirty           TRUE
TheComeOnMan
  • 12,535
  • 8
  • 39
  • 54
  • Thank you for responding to my question! Unfortunately, there seems to have two problems. First, while it gave correct answers for some animals (e.g., A and B), but it produced incorrect answers for others (e.g., for animal C, it supposed to be 20, 26, 26 for oxygen values). Second, I got an error message when I ran the code to make "subsetted" object: Error in eval(expr, envir, enclos) : object 'LowestOneThird' not found. I tried to fix but did not succeed. – kiyoshi sasaki Oct 20 '13 at 00:17
  • Sorry about the second error message, I see the problem and I will fix it once I clarify my doubt about how you choose your lowest one third. For `animal == "C" & time <= 48 & time >= 5`, the oxygen values are `c(20,26,26)`. The lowest one third (`quantile(c(20,26,26),1/3)`) therefore only consists of 20, and doesn't include 26. Is that not how you are calculating it? – TheComeOnMan Oct 20 '13 at 08:29
  • Thank you for your reply. Sorry, it looks like I was wrong. I sorted my data in Excel and found that your original output and explanations are correct (does not include 26). If you can fix the code to solve my error message problem, that would be very helpful. I thought that other two helpers' codes produced what I wanted, but it looks like your code produces the correct answer. I wonder if you could tell me why other helpers' codes do not produce the same output as yours? In their codes, animal C has (20, 26, 26). – kiyoshi sasaki Oct 20 '13 at 13:36
  • I've edited the code so the error shouldn't occur now. For the other answers, I suspect that writing the condition as `oxygen <= quantile(oxygen, 1/3) & time >= 5 & time <=48)` makes the code calculate the quantile over all the oxygen values for each animal, then take the lowest one-third, and then filter for the time. Instead, `ddply(dat2, "animal", subset, oxygen <= quantile(oxygen, 1/3) & time >= 5 & time <=48)` where `dat2 <- subset(dat, time >= 5 & time <= 48)` seems to produce the right result – TheComeOnMan Oct 20 '13 at 13:59
  • Great! It worked. Also, your modifications for ddply function also worked. Thank you so much! – kiyoshi sasaki Oct 20 '13 at 14:08
0

You can use by from base R with do.call (rbind)

dat1 <- with(dat,by(dat,animal,subset,oxygen <= quantile(oxygen, 1/3) & time >= 5 & time <=48))
> dat1
animal: A
  animal oxygen time habitat
5      A     25   15   clean
9      A     20   11   clean
----------------------------------------------------------------------------------------------------------- 
animal: B
   animal oxygen time habitat
2       B     24   33   clean
14      B     24   18   clean
26      B     20   17   clean
----------------------------------------------------------------------------------------------------------- 
animal: C
   animal oxygen time habitat
11      C     20   24   dirty
15      C     26   39   dirty
19      C     26   26   dirty
----------------------------------------------------------------------------------------------------------- 
animal: D
   animal oxygen time habitat
8       D     27   40   dirty
12      D     27   48   dirty
16      D     22   46   dirty


do.call(rbind,dat1)
     animal oxygen time habitat
A.5       A     25   15   clean
A.9       A     20   11   clean
B.2       B     24   33   clean
B.14      B     24   18   clean
B.26      B     20   17   clean
C.11      C     20   24   dirty
C.15      C     26   39   dirty
C.19      C     26   26   dirty
D.8       D     27   40   dirty
D.12      D     27   48   dirty
D.16      D     22   46   dirty
Metrics
  • 15,172
  • 7
  • 54
  • 83
  • I noticed that the code provided did not produced what I intended to do, as pointed out by Codoremifa above. Please see our correspondence above as well as my reply to Ben Bolker above. – kiyoshi sasaki Oct 20 '13 at 13:51