0

Note: I found a similar question, for which there was an answer explaining the problem. However, I'm looking for an answer, as opposed to a reason why it's difficult (which I fully understand).

I have data for which I want to create a histogram. This data has a count of 10000 for the bin [0, 200) and a count of 1 for several bins such as [30000, 30200). Both bins are important and need to be visible. For this, I can perform a histogram with the log1p scale.

contig_len <- read.table(data_file, header = FALSE, sep = ",", col.names=c("Length"))
ggplot(contig_len, aes(x = Length)) + geom_histogram(binwidth=200) +
    scale_y_continuous(trans="log1p")

Good histogram

This works perfectly! But now, I want to categorise the items in the histogram, as follows:

ggplot(contig_len, aes(x = Length, fill = Prevalence)) +
    geom_histogram(binwidth=200, alpha=0.5, position="stack") +
    scale_y_continuous(trans = "log1p")

Bad Histogram

This doesn't work, however, as the stacking is performed without taking the log scale into account. Has anyone found a way around this problem? My data looks like this:

head(contig_len)
       Length    Prevalence
   1    606      Repetitive (<5)
   2    888      Non-Repetitive
   3    192      Repetitive (<9)
   4   9830      Non-Repetitive
   5    506      Non-Repetitive
   6    850      Non-Repetitive
alistaire
  • 42,459
  • 4
  • 77
  • 117
  • 1
    Welcome to StackOverflow! For code debugging please always ask with a [reproducible](https://stackoverflow.com/q/5963269/1422451) example per the [MCVE](https://stackoverflow.com/help/mcve) and [`r`](https://stackoverflow.com/tags/r/info) tag description, with the desired output. You can use `dput()`, `reprex::reprex()` or built-in data sets for reproducible data. Also, you might want to link to the question you've mentioned within your question. – Hack-R Jul 15 '18 at 15:15
  • 3
    This is bound to be a misleading chart, as relative heights and areas are not comparable. Facetting categories may be a better approach. – alistaire Jul 15 '18 at 15:17
  • This isn't nearly enough data to replicate the issue, i.e. there's likely only 1 observation here for any reasonable set of bins. – camille Jul 15 '18 at 16:57
  • For a reproducible version of this problem, try the following: `ggplot(diamonds, aes(x=carat, fill=cut)) + geom_histogram(bins=100, position="stack") + scale_y_continuous(trans="log1p")` – bdemarest Jul 15 '18 at 23:18

0 Answers0