1

Currently I am studying QDA and am using R software to analyze my data.

The data was downloaded from the below link:

https://www.kaggle.com/uciml/pima-indians-diabetes-database

I want to check the QDA assumption i.e. the two groups are multivariate normally distributed, hence have used the below command in R.

library(MVN)    
group1 <- discrim[1:500, 1:8]
result<- mardiaTest(group1, qqplot = FALSE) #To check whether our data from group1 is MND    
group2 <- discrim[501:765, 1:8]    
result2 <- mardiaTest(group2, qqplot= TRUE)#To check whether our data from group2 is MND

Both groups are non-normally distributed, so I want to normalize the data and have coded the below to normalize the data for the first group.

x1bar <- t(t(as.vector(sapply(as.data.frame(group1),mean))))    
x1bartilda<- (x1bar - mean(x1bar))/sd(x1bar)

Similarly for group2, however mean vector X for group1 didn't give a result that is close to 0.

Can anyone help me what is the way forward please?

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
Annalise Azzopardi
  • 113
  • 1
  • 2
  • 13
  • What is the real question? Using a mean wich is out of the group for standardising a variable is bound to not yield a zero-mean result... – AlexR Jan 22 '17 at 08:26
  • 1
    How can i normalize my data and continue working qda on my data? – Annalise Azzopardi Jan 22 '17 at 08:30
  • That depends on the actual data distribution and is a question better suited for [stats.SE]. – AlexR Jan 22 '17 at 08:31
  • OK thank you @AlexR will post it in Cross Validated – Annalise Azzopardi Jan 22 '17 at 08:33
  • Be prepared to show some density plots of your data there. Usually, transforming a variable so that it's normally distributed is more difficult than just calling `scale()` on it. – AlexR Jan 22 '17 at 08:35
  • or being more explicit than AlexR's comment: scaling a variables will not make a non-normally distributed variable normal. What it will do is put all your variables on the same measurement scale. – user20650 Jan 22 '17 at 15:26

1 Answers1

1

If you want to normalize with mean 0 and std 1 you can use scale commmand.

Example:

my_data <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5))
my_data_scaled <- scale(my_data)

summary(my_data_scaled)

The result is:

   x                  y           
 Min.   :-1.91046   Min.   :-1.59037  
 1st Qu.:-0.57552   1st Qu.:-0.39842  
 Median : 0.06104   Median : 0.01998  
 Mean   : 0.00000   Mean   : 0.00000  
 3rd Qu.: 0.47280   3rd Qu.: 0.84296  
 Max.   : 1.74638   Max.   : 1.10514

The mean is 0.

seralouk
  • 30,938
  • 9
  • 118
  • 133