2

I have a large dataset with the lifespan of threads on an discussion board. I want a histogram that shows the distribution of lifespan, so I did this:

dall <- read.csv("lifespan.csv")
colnames(dall) <- c("thread.id", "seconds.alive", "start.time")
hist(dall$seconds.alive)

which generated this hard to read image: http://dl.dropbox.com/u/285483/tmp/screenshot297.png

My questions are a) is changing y-axis to a log-scale a good way to make it more readable? Apparently some people think is a bad idea to change y-axis to log.

b) how do I do that?

amh
  • 690
  • 1
  • 8
  • 19
  • 2
    Given the that bars start at zero, and log(0) is -infinity, what exactly would you want the graph to display? – hadley Nov 10 '10 at 21:59
  • Something similar was discussed @ CrossValidated: http://stats.stackexchange.com/questions/1764/what-are-alternatives-to-broken-axes – Roman Luštrik Nov 11 '10 at 09:19

1 Answers1

4

I would try using hist(log10(dall$seconds.alive)) instead.

Also try specifying breaks=100 or smaller/larger number:

hist(log10(dall$seconds.alive), breaks=100)
John_West
  • 2,239
  • 4
  • 24
  • 44