-2

Say I want to look at a subset whose age is 55 to 100, to look at their health care costs.

I've used:

Elders <- subset(midus, Age>= 55 | Age<100)
mean(Elders$Cost, na.rm=TRUE)
#78.8445

I understand this should give me the mean cost for people between 55 and 100. In this case, it's 78.8445

Sounds great. BUT, to check, I compare it to ages 95-100:

Elders2<-subset(midus,Age>=95 | Age<100)
mean(Elders2$Cost, na.rm=TRUE)
#78.8445

It seems very unlikely to me that these two subsets have identical means. And I can't figure out what I did wrong to make it think that they do. Anyone have any ideas?

Appreciate the help. I've lurked stack overflow since starting this class and it's helped me immensely.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
Chris Kilbourn
  • 29
  • 1
  • 1
  • 2
  • This seems less of a problem with `subset` and more of a problem in that the computer gave you *exactly* what you asked for instead of what you intended. – Dason Oct 19 '13 at 20:51
  • 1
    @DWin in this case I think it's much simpler. x > a | x < b (with a – Michele Oct 19 '13 at 23:34

2 Answers2

3

I find using the [ syntax less confusing than subset. You haven't given a sample of data to help us but something like this should work. And surely you mean AND (&) rather than OR (|) in your code?

Elders <- midus[midus$Age >= 55 & midus$Age < 100, ]

Also check out this question and the answers.

Community
  • 1
  • 1
SlowLearner
  • 7,907
  • 11
  • 49
  • 80
  • 1
    Truthfully, I don't understand why, but using [] instead solved the problem. Thanks a lot SlowLearner. – Chris Kilbourn Oct 19 '13 at 20:13
  • 3
    "And surely you mean AND (&) rather than OR (|) in your code" is why this code worked and your code didn't. – TheComeOnMan Oct 19 '13 at 20:43
  • @ChrisKilbourn the problem is not `[]` nor `subset`. you used the wrong logical condition. You did the mean of the **whole** table in both cases. have you tried `mean(midus$Cost, na.rm=TRUE)`? – Michele Oct 19 '13 at 23:36
3

Here's a solution using subset

> # generating some data
> set.seed(1)
> midus <- data.frame(ID=1:50,
+                     Age=sample(20:100, 50, TRUE), 
+                     Cost=rnorm(50, 100, 3))
> 
> Elders <- subset(midus, Age>= 55 & Age<100) # subseting
> mean(Elders$Cost) 
[1] 100.2068
> 
> Elders2<-subset(midus, Age>=95 & Age<100)
> mean(Elders2$Cost)
[1] 98.78458

As you can see, just changing | to & gives what you want. This is because you want those values of Age between 55 AND 100 (not including 100), so you need to use the & operator.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138