I have a data set which contains 163 columns(variable) and 199566 rows(data). So How can i eleminate redundant data ? Can i do this by using normal distribution?
Asked
Active
Viewed 41 times
-3
-
2We have too little informations. What did you tried, are there errors? You should take a look here: https://stats.stackexchange.com/a/6800 – Gaterde Apr 23 '18 at 10:29
-
3Welcome to StackOverflow. In order to ask a better question please read [How to ask a good question](https://stackoverflow.com/help/how-to-ask) and [Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve) and [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Rui Barradas Apr 23 '18 at 10:30
-
2Give an example of data and what you did so far – Al14 Apr 23 '18 at 10:31
-
"Normalize" *means* "eleminate redundant data". But "redundant" depends on the situation. What makes data "redundant" in this situation? What *is* the situation? What are you trying to accomplish & how are you trying accomplishing it? – philipxy Apr 24 '18 at 01:43
1 Answers
0
Maybe try dimensionality reduction methods such as PCA. It will help you reduce the amount of columns as if I understand correctly is what you want to achieve.
If you haven't used them before, you will probably have to read more about what these techniques exactly do but the above will get you started.

Vasilis Vasileiou
- 507
- 2
- 8
- 20