Cleaning unnecessary variable in big data by using R

Question

I have a data set which contains 163 columns(variable) and 199566 rows(data). So How can i eleminate redundant data ? Can i do this by using normal distribution?

We have too little informations. What did you tried, are there errors? You should take a look here: https://stats.stackexchange.com/a/6800 — Gaterde, Apr 23 '18 at 10:29
Welcome to StackOverflow. In order to ask a better question please read [How to ask a good question](https://stackoverflow.com/help/how-to-ask) and [Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve) and [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Rui Barradas, Apr 23 '18 at 10:30
"Normalize" *means* "eleminate redundant data". But "redundant" depends on the situation. What makes data "redundant" in this situation? What *is* the situation? What are you trying to accomplish & how are you trying accomplishing it? — philipxy, Apr 24 '18 at 01:43

score 0 · Answer 1 · answered Apr 23 '18 at 10:47

Maybe try dimensionality reduction methods such as PCA. It will help you reduce the amount of columns as if I understand correctly is what you want to achieve.

If you haven't used them before, you will probably have to read more about what these techniques exactly do but the above will get you started.

Cleaning unnecessary variable in big data by using R

1 Answers1