0

I am quite new to R and now I hope that you could help me. I have a dataset with 22 variables and more than 50000 rows. For further calculations I want to selecte the top 5 values in every column and delete the others. How can I do that?

Thanks for your help.

S_U
  • 3
  • 1

1 Answers1

2

You should give us a reproducible example.

We can use apply and sort to achieve this task. Assuming that dat is the original data frame and dat2 is the final output.

set.seed(123)

dat <- as.data.frame(matrix(rnorm(50), ncol = 5))

#             V1         V2         V3          V4          V5
#  1  -0.56047565  1.2240818 -1.0678237  0.42646422 -0.69470698
#  2  -0.23017749  0.3598138 -0.2179749 -0.29507148 -0.20791728
#  3   1.55870831  0.4007715 -1.0260044  0.89512566 -1.26539635
#  4   0.07050839  0.1106827 -0.7288912  0.87813349  2.16895597
#  5   0.12928774 -0.5558411 -0.6250393  0.82158108  1.20796200
#  6   1.71506499  1.7869131 -1.6866933  0.68864025 -1.12310858
#  7   0.46091621  0.4978505  0.8377870  0.55391765 -0.40288484
#  8  -1.26506123 -1.9666172  0.1533731 -0.06191171 -0.46665535
#  9  -0.68685285  0.7013559 -1.1381369 -0.30596266  0.77996512
# 10  -0.44566197 -0.4727914  1.2538149 -0.38047100 -0.08336907

dat2 <- as.data.frame(apply(dat, 2, function(x) sort(x, decreasing = TRUE)[1:5]))
dat2
#           V1        V2         V3        V4          V5
# 1 1.71506499 1.7869131  1.2538149 0.8951257  2.16895597
# 2 1.55870831 1.2240818  0.8377870 0.8781335  1.20796200
# 3 0.46091621 0.7013559  0.1533731 0.8215811  0.77996512
# 4 0.12928774 0.4978505 -0.2179749 0.6886403 -0.08336907
# 5 0.07050839 0.4007715 -0.6250393 0.5539177 -0.20791728
www
  • 38,575
  • 12
  • 48
  • 84
  • why use apply for a dataframe? That is coercing it into a matrix: use `sapply` or `lapply` – Onyambu Dec 28 '17 at 17:39
  • 1
    @Onyambu not sure if that is a major issue. it seems like (this is an assumption based on OP's lack of [reproducible example](https://stackoverflow.com/a/5963610/5619526)) each column in the OP's data is numeric – bouncyball Dec 28 '17 at 17:45
  • @Onyambu Thanks for your comment. I feel like `apply` is more robust here. Welcome to contribute an answer using `lapply` or `sapply`. Feel free to use my reproducible example. – www Dec 28 '17 at 17:54
  • `sapply(dat,sort,decreasing=T)[1:5,]` something like that should work – Onyambu Dec 28 '17 at 17:56
  • 1
    @Onyambu The output is the same as `apply(dat, 2, function(x) sort(x, decreasing = TRUE)[1:5])`, which is a matrix. – www Dec 28 '17 at 18:04