2

I have DNA segment lengths (relative to chromosome arm, 251296 entries), as such:

0.24592963
0.08555043
0.02128725
...

The range goes from 0 to 2, and I would like to make a continuous relative frequency plot. I know that I could bin the values and use a histogram, but I would like to show continuity. Is there a simple strategy? If not, I'll use binning. Thank you!

EDIT:

I have created a binning vector with 40 equally spaced values between 0 and 2 (both included). For simplicity's sake, is there a way to round each of the 251296 entries to the closest value within the binning vector? Thank you!

Johnathan
  • 1,877
  • 4
  • 23
  • 29
  • When you say "continuous relative frequency plot" do you mean that some of your 251,296 entries are duplicated and you want this frequency to be plotted on the y-axis? – Nathan S. Watson-Haigh May 12 '15 at 03:19
  • Use a kernel density plot. There are a few ways to do that in R (or any other software) if you look it up. – Frank May 12 '15 at 03:20
  • @NathanS.Watson-Haigh Hi! I mean that the values come from a continuous variable. There may be some duplicates. :) – Johnathan May 12 '15 at 03:24
  • 1
    Are you looking for something like `x <- runif(100, 0, 2); hist(x, freq=FALSE); lines(density(x))`? If so, see this question: [Fitting a density curve to a histogram in R](http://stackoverflow.com/questions/1497539/fitting-a-density-curve-to-a-histogram-in-r) – Jota May 12 '15 at 03:33

1 Answers1

4

Given that most of your values are not duplicated and thus don't have an easy way to derive a value for plotting on the y-axis, I'd probably go for a density plot. This will highlight dense segment lengths i.e. where you have lots of segment lengths occurring near each other.

d <- c(0.24592963, 0.08555043, 0.02128725)
plot(density(d), xlab="DNA Segment Length", xlim=c(0,2))

enter image description here

Nathan S. Watson-Haigh
  • 5,043
  • 2
  • 19
  • 19
  • Hi! Thank you for your quick response. I did it and it looks nice. However, I would like some segments near the value to show up. So, I think that I use relative histogram AND plot the density function. – Johnathan May 12 '15 at 03:37