R ggplot2 multi-series histogram as lines

Question

I've got a tbl_df of 328040 rows

head(homVar)

  sample CHROM     POS ID    QUAL DP
1   H001 chr2L   43265  . 1790.77 50
2   H001 chr2L  950701  .  396.78 15
3   H001 chr2L  950723  .  430.77 14
4   H001 chr2L  950730  .  350.77 11
5   H001 chr2L 1648327  .  494.77 14
6   H001 chr2L 3274239  .  203.84  6

The column 'sample' is a character ranging from H001 up to H230. The column 'CHROM' is a factor with seven levels. The 'POS' value for CHROM is not necessarily unique. Each row corresponds to a position of genetic variation, and there are different numbers of rows per sample.

What I'm generally trying to do is plot the frequency of variants by position, CHROM and sample. I am able to do a normal bar histogram but it's not practical for visual interpretation. I'm able to do a density plot but this doesn't show the absolute counts which are the most informative.

Specifically what I'd like to do is generate the data for a histogram and then plot this as lines, but retain the separation by sample and CHROM. So in window sizes of e.g. 100000 count the number of rows (for each sample and CHROM).

Code for the density plot is:

my.plot = 
ggplot(homVar, aes(POS, col=sample)) + 
geom_density(weight=0.5) + 
facet_wrap(~CHROM, ncol=1)
my.plot

I'm looking at the ggplot_build function and information from

Making ggplot2 plot density histograms as lines

Need to extract data from the ggplot geom_histogram

Any advice on how to plot the facet, multi-series histogram as a line would be much appreciated.

do you seriously want 7*230 histograms in one plot? in any case, I found the most straightforward approach for this to generate the data for the histogram using the base R `hist` function and then feed that to `geom_step` — mts, Jul 01 '15 at 11:00
@mts You don't need `hist` for this: http://stackoverflow.com/a/31069426/1412059 — Roland, Jul 01 '15 at 11:14

R ggplot2 multi-series histogram as lines

0 Answers0