0

I'm trying to plot an overlay of 2 histograms, where one represents all data, and the other represents a subset that meets a criteria. I have a .csv file that I'm importing as a dataframe, code that builds a new dataframe with the categories, and a ggplot that plots them together.

My program code is as follows:

  dataHaploid=read.csv("Graphes/Hap.csv")

  
  dataHaploid$Group="All Loci"

  
  dht=dataHaploid[dataHaploid$Divergence>.01, ]

  dht$Group="Divergent Loci"

  dataHaploid <- rbind(dataHaploid,dht)

  
  
  dha=dataHaploid[dataHaploid$BetaDist==0, 
    ][dataHaploid[dataHaploid$BetaDist==0, ]$SignalWidth==10, 
    ][dataHaploid[dataHaploid$BetaDist==0, 
    ][dataHaploid[dataHaploid$BetaDist==0, ]$SignalWidth==10, 
    ]$EcolWidth==10, ]
  
  
  dhaa =ggplot(dha, aes(MeanRecombinationRate, fill=Group)) + 
  geom_histogram(alpha=.5, position="identity", bins=25) +
    ylim(0,100)+xlim(0,.5)
dhaa  

When I plot it however, I get the following graph:

overlaid histogram

As the blue is a subset of the red, there should never be a solid blue area, like there are in a couple of the bins. additionally, when I plot the data all together, I get the following:

total histogram

which shows that there shouldn't be any data in those bins anyways, if the binning scheme was consistent.

I'm not sure what to do here, or how to debug this; any help is appreciated.

I've tried modifying the code that builds the new dataset, but have confirmed that the resultant set has 630 points (500 from original) and 130 points match the criteria, and 130 points are in the second bin. I've tried changing bin numbers, and it occurs with different bin numbers-- this happens in all of my subsets, not just subset a, which I include the code on.

following R geom_histogram position="identity" inconsistent : This also persists if i set binwidth = 0.01, boundary = 0 instead of defining bin numbers; the bins are also set by xlim and number of bins to be consistent. with that set: overlay histogram with binwidth

though, oddly, plotting all of them yields 2 bins that are empty, even though there's only one just-blue bin in the first:

total histogram with binwidth

  • 2
    Welcome to SO! It would be easier to help you if you provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data. – stefan Mar 10 '23 at 06:35

1 Answers1

0

Whoops, figured it out, the ylim was cutting out the bars since the data had more than 100 points in that bin. I feel silly. Found this out when wasn't able to make a dataset that reproduced the issue -- since it needed more than 100 points to result in the error, and was exceeding the character limit... anyways, I appreciate the edit + comment, new to stackOverflow