1

My imported data set consists of predetermined ranges and their probability density values. I have plotted this in a bar chart in R. So my plot shows a histogram, but to R its just a bar plot. However, I now need to put a curve on this bar chart for visualization purposes, using same data in bar chart.

The code I have used so far is creating a funny looking curve that doesn't fit appropriately to the bar chart...Any help would be hugely appreciated please!

Code used so far:

barplot(Data10$pdf, names = Data10$ï..Weight.Range, xlab = "Weight", ylab = "Probability Density", ylim = c(0.00,0.05), main = "Histogram")

fit1<-smooth.spline(Data10$ï..Weight.Range, Data10$pdf, df=12, spar = 0.2)
lines(fit1,col="blue", lwd=3)

Link to output of this code:

output

Data:

Data10 <- structure(list(
ï..Weight.Range = c(0, 0.5, 1, 1.5, 2, 2.5, 3, 
3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10.5, 
11, 11.5, 12, 12.5, 13, 13.5, 14, 14.5, 15, 15.5, 16, 16.5, 17, 
17.5, 18, 18.5, 19, 19.5, 20, 20.5, 21, 21.5, 22, 22.5, 23, 23.5, 
24, 24.5, 25, 25.5, 26, 26.5, 27, 27.5, 28, 28.5, 29, 29.5, 30, 
30.5, 31, 31.5, 32, 32.5, 33, 33.5, 34, 34.5, 35, 35.5, 36, 36.5, 
37, 37.5, 38, 38.5, 39, 39.5, 40, 40.5, 41, 41.5, 42, 42.5, 43, 
43.5, 44, 44.5, 45, 45.5, 46, 46.5, 47, 47.5, 48), pdf = c(0.012697609, 
0.015237131, 0.017776653, 0.019046414, 0.020694512, 0.022575831, 
0.024457151, 0.02633847, 0.028219789, 0.030101109, 0.031982428, 
0.033863747, 0.035745066, 0.037626386, 0.039507705, 0.041389024, 
0.043270343, 0.045151663, 0.042420729, 0.03688759, 0.033198831, 
0.029510072, 0.026374627, 0.023976934, 0.02264407, 0.021614794, 
0.020585518, 0.019556242, 0.018526967, 0.017497691, 0.016468415, 
0.015439139, 0.014409863, 0.013380587, 0.012351311, 0.011322035, 
0.009839476, 0.008433837, 0.007731017, 0.007028197, 0.005622558, 
0.004919738, 0.004568328, 0.004498046, 0.004427764, 0.004357482, 
0.0042872, 0.004216918, 0.004146636, 0.004076354, 0.004006072, 
0.00393579, 0.003865508, 0.003795226, 0.003724944, 0.003654663, 
0.003584381, 0.003514099, 0.003443817, 0.003373535, 0.003303253, 
0.003232971, 0.003162689, 0.003092407, 0.003022125, 0.002951843, 
0.002881561, 0.002811279, 0.002740997, 0.002670715, 0.002600433, 
0.002530151, 0.002459869, 0.002389587, 0.002319305, 0.002249023, 
0.002178741, 0.002108459, 0.002038177, 0.001967895, 0.001897613, 
0.001827331, 0.001757049, 0.001686767, 0.001616485, 0.001546203, 
0.001475921, 0.001405639, 0.001335357, 0.001265075, 0.001194794, 
0.001124512, 0.00105423, 0.000983948, 0.000913666, 0.000843384, 
0.000773102)
), class = "data.frame", row.names = c(NA, -97L))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
l8c4g
  • 21
  • 2
  • [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data, all necessary code, and a clear explanation of what you're trying to do and what hasn't worked. If you're trying to treat this as a histogram, why not use `hist`? – camille Feb 06 '20 at 15:33
  • Hi there, I haven't been able to use the hist since my data is already in weight ranges with their probability density values. I don't have access to the raw data. I assumed it would be wrong to use hist on data that has predetermined ranges and probability density values... I am very new to R.. so maybe I was wrong to assume this? Ideally I need to curve to fit across the bar plot, and not sitting far to the left of it... – l8c4g Feb 06 '20 at 15:42
  • Okay, that makes sense. Without having your data that wasn't clear – camille Feb 06 '20 at 16:03

2 Answers2

0

You need to feed in the initial barplot when drawing the new lines.

my_bar <- barplot(Data10$pdf, names = Data10$ï..Weight.Range, xlab = "Weight", ylab = "Probability Density", ylim = c(0.00,0.05), main = "Histogram")

fit1<-smooth.spline(Data10$ï..Weight.Range, Data10$pdf, df=12, spar = .2)

lines(my_bar, fit1$y,col="blue",type="l",lwd=3)
sdhaus
  • 1,866
  • 13
  • 20
0

The barplot function is meant to be used with a categorical variable. It is treating your x values as categories rather than a continuous number. When barplot runs, it calculates an value for each category which it silently returns. You can use those returned values with the result from your smooth spline to draw the line. For example

xx <- barplot(Data10$pdf, names = Data10$ï..Weight.Range, xlab = "Weight", ylab = "Probability Density", ylim = c(0.00,0.05), main = "Histogram")

fit1<-smooth.spline(Data10$ï..Weight.Range, Data10$pdf, df=12, spar = 0.2)
lines(xx[,1], fit1$y,col="blue", lwd=3)

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Brilliant, thank you very much!! Just one quick question - what is 'xx[,1]' doing in the lines command? Just so that I fully understand what you have done! Thanks, very much – l8c4g Feb 06 '20 at 16:43
  • `barplot` returns a matrix, i was just grabbing the first column of that matrix. If you pass in a simple vector of values to `barplot`, it doesn't really make much of a difference; i just wanted to be explicit. – MrFlick Feb 06 '20 at 17:56
  • Is it possible to extract the parameters from this curve? – l8c4g Feb 10 '20 at 12:09
  • There is no closed form for a smooth.spline. It’s a function of the observed data. If you want some sort of parameterized line, would would need to do some type of model fitting. – MrFlick Feb 10 '20 at 14:33