How to normally distribute the data using Statistics

Question

I have a data sample (say 1.9 4.8 3.9 4.7 2.3 4.6 3.9)

I want to distribute the data on Bell Curve and give ratings between 1 to 5. How to do so using statistics.?

(The top 20% will be rated 5, then 4 and so on)

See [this SO article](http://stackoverflow.com/questions/19343133/setting-upper-and-lower-limits-in-rnorm) which seems to handle your question. — Tim Biegeleisen, Apr 06 '15 at 05:42
You basically want to use `rnorm` but limit the upper and lower values. — Tim Biegeleisen, Apr 06 '15 at 05:42
@DominicComtois My ratings will be from 1 to 5. Based on the normal distribution of my sample, it should allot from 1-5 to all my sample data — Frankenstein, Apr 06 '15 at 06:08

Dominic Comtois · Accepted Answer · 2015-04-06T13:28:03.157

Here's for the first part of your question:

x <- c(1.9,4.8,3.9,4.7,2.3,4.6,3.9)

sigma <-  sd(x)  # 1.175747
mu <- mean(x)    # 3.728571

curve((1/(sigma*sqrt(2*pi)))*exp(-((x-mu)^2)/(2*sigma^2)),xlim=c(-1,9),ylab="density")

y <- (1/(sigma*sqrt(2*pi)))*exp(-((x-mu)^2)/(2*sigma^2))
points(x, y, col="red")

normal distribution

Second part

There's probably a more straightforward way to do it, but this does the trick:

p.quint <- qnorm(p = c(0, .2, .4, .6, .8, 1), mean = mu, sd = sigma)
names(p.quint) <- c(1:5, NA)

p.quint
#    1        2        3        4        5     <NA> 
# -Inf 2.739038 3.430699 4.026444 4.718105      Inf     

# Check how many items in p.quint are lower than p and use this as
# the index to p.quint's names and store it in x.quint
x.quint <- unlist(lapply(x, function(a) as.numeric(names(p.quint))[sum(a > p.quint)]))

cbind(x, x.quint)

#        x x.quint
# [1,] 1.9       1
# [2,] 4.8       5
# [3,] 3.9       3
# [4,] 4.7       4
# [5,] 2.3       1
# [6,] 4.6       4
# [7,] 3.9       3

Previous answer for second part

[This was before OP mentionned desired output would represent quintiles]

Ok, I see what you mean now. So let's do like this:

x <- c(1.9,4.8,3.9,4.7,2.3,4.6,3.9)

# sort x to simplify matters
x <- sort(x)

# standardize x around its mean
x.tr <- x - mean(x)

# Check range ; we want it to be 4 (5-1)
range(x.tr)[2] - range(x.tr)[1]  # 2.9

# Apply transformation to stretch the data a little bit
x.tr <- x * 4/2.9

range(x.tr)[2] - range(x.tr)[1]
# [1] 4

# We also want the min to be 1
x.tr <- x.tr - (x.tr[1]-1)

mu <- mean(x.tr)   # 3.522167
sigma <- sd(x.tr)  # 1.62172
x <- x.tr
curve((1/(sigma*sqrt(2*pi)))*exp(-((x-mu)^2)/(2*sigma^2)),xlim=c(-1,9),ylab="density")

y.tr <- (1/(sigma*sqrt(2*pi)))*exp(-((x.tr-mu)^2)/(2*sigma^2))
points(x.tr, y.tr, col="blue")

You now have your points ranging from 1 to 5 on a normal distribution with the following parameters:

mu
# [1] 3.522167

sigma
# [1] 1.62172

second normal distribution

Thanks for your response. I wanted to allocate ratings of 1 - 5 based on their distribution on the bell curve (normal dist). Could you share some knowledge on the same.? — Frankenstein, Apr 06 '15 at 06:41

How to normally distribute the data using Statistics

1 Answers1