0

In my lattice histogram:

histogram(~bill|group,data=mydat,type='count',nint=50,layout=c(9,3))

How to keep "bill" data before 99th percentile within each group?

kuki
  • 303
  • 2
  • 6
  • 15
  • This is very unclear. What do you mean by "keep"? – joran Jun 18 '13 at 14:40
  • If the data are very skewed, you could add `scales = list(x = list(log = 2))` to the arguments for `histogram` as a potential alternative to eliminating data. – BenBarnes Jun 18 '13 at 14:43
  • @joran Keep is to keep, others (1% at the end) will be trimmed. – kuki Jun 18 '13 at 14:54
  • Do you want to remove outliers from all the bill data or within each group? – Seth Jun 18 '13 at 14:57
  • @BenBarnes I don't necessary need log transformation at this stage; the data has outliners that unanimously present at the 1% end while 99% others show normal distribution. – kuki Jun 18 '13 at 14:58
  • @Seth Yes! And I already know that all these outliners (within each group) are at the top 1% --that's why I want to take them off. – kuki Jun 18 '13 at 14:59

1 Answers1

1

You may want to remove outliers from the whole of the bill data. First copy your data into a new variable

    mydat$bill.cleaned=mydat$bill

then set the large values to missing

    cuttoff= qnorm(.99,mean(mydat$bill),sd(mydat$bill))
    mydat$bill.cleaned[which(mydat$bill > cuttoff)]=NA

Then you can display your histogram of cleaned data.

Alternatively if you want to remove ouliers only within each group. you need to do the same thing as above with an additional apply statement.

Seth
  • 4,745
  • 23
  • 27
  • Thank you @Seth! I later also find a similar solution like this: http://stackoverflow.com/a/4788102/2078985 – kuki Jun 18 '13 at 15:13
  • How to "do the same thing as above with an additional apply statement" ? I searched for a while but couldn't find a solution. – kuki Jun 18 '13 at 17:09
  • I got the problem solved using ave (learnt from another post). Thanks! – kuki Jun 18 '13 at 18:45