I've got a tbl_df of 328040 rows
head(homVar)
sample CHROM POS ID QUAL DP
1 H001 chr2L 43265 . 1790.77 50
2 H001 chr2L 950701 . 396.78 15
3 H001 chr2L 950723 . 430.77 14
4 H001 chr2L 950730 . 350.77 11
5 H001 chr2L 1648327 . 494.77 14
6 H001 chr2L 3274239 . 203.84 6
The column 'sample' is a character ranging from H001 up to H230. The column 'CHROM' is a factor with seven levels. The 'POS' value for CHROM is not necessarily unique. Each row corresponds to a position of genetic variation, and there are different numbers of rows per sample.
What I'm generally trying to do is plot the frequency of variants by position, CHROM and sample. I am able to do a normal bar histogram but it's not practical for visual interpretation. I'm able to do a density plot but this doesn't show the absolute counts which are the most informative.
Specifically what I'd like to do is generate the data for a histogram and then plot this as lines, but retain the separation by sample and CHROM. So in window sizes of e.g. 100000 count the number of rows (for each sample and CHROM).
Code for the density plot is:
my.plot =
ggplot(homVar, aes(POS, col=sample)) +
geom_density(weight=0.5) +
facet_wrap(~CHROM, ncol=1)
my.plot
I'm looking at the ggplot_build function and information from
Making ggplot2 plot density histograms as lines
Need to extract data from the ggplot geom_histogram
Any advice on how to plot the facet, multi-series histogram as a line would be much appreciated.