I'm trying to plot an overlay of 2 histograms, where one represents all data, and the other represents a subset that meets a criteria. I have a .csv
file that I'm importing as a dataframe, code that builds a new dataframe with the categories, and a ggplot that plots them together.
My program code is as follows:
dataHaploid=read.csv("Graphes/Hap.csv")
dataHaploid$Group="All Loci"
dht=dataHaploid[dataHaploid$Divergence>.01, ]
dht$Group="Divergent Loci"
dataHaploid <- rbind(dataHaploid,dht)
dha=dataHaploid[dataHaploid$BetaDist==0,
][dataHaploid[dataHaploid$BetaDist==0, ]$SignalWidth==10,
][dataHaploid[dataHaploid$BetaDist==0,
][dataHaploid[dataHaploid$BetaDist==0, ]$SignalWidth==10,
]$EcolWidth==10, ]
dhaa =ggplot(dha, aes(MeanRecombinationRate, fill=Group)) +
geom_histogram(alpha=.5, position="identity", bins=25) +
ylim(0,100)+xlim(0,.5)
dhaa
When I plot it however, I get the following graph:
As the blue is a subset of the red, there should never be a solid blue area, like there are in a couple of the bins. additionally, when I plot the data all together, I get the following:
which shows that there shouldn't be any data in those bins anyways, if the binning scheme was consistent.
I'm not sure what to do here, or how to debug this; any help is appreciated.
I've tried modifying the code that builds the new dataset, but have confirmed that the resultant set has 630 points (500 from original) and 130 points match the criteria, and 130 points are in the second bin. I've tried changing bin numbers, and it occurs with different bin numbers-- this happens in all of my subsets, not just subset a, which I include the code on.
following R geom_histogram position="identity" inconsistent :
This also persists if i set binwidth = 0.01, boundary = 0 instead of defining bin numbers; the bins are also set by xlim and number of bins to be consistent. with that set:
though, oddly, plotting all of them yields 2 bins that are empty, even though there's only one just-blue bin in the first: