Keep the most frequent columns into a data.frame

Question

Having a data.frame like this:

data.frame(id = c(1,2,3), stock = c(3,1,4), bill = c(1,0,1), bear = c(3,2,5))

How is it possible to sum all columns expect id column and keep the first two columns which have higher frequency.

Example of expected output

data.frame(id = c(1,2,3), stock = c(3,1,4), bear = c(3,2,5))

Expected output is not reproducible. – s_baldur Feb 04 '20 at 09:57 — s_baldur, Feb 04 '20 at 09:57

score 3 · Accepted Answer · answered Feb 04 '20 at 09:57

3

In base R, we can use colSums to sum the columns, sort them and select the name of 2 highest valued columns using tail.

cbind(df[1], df[names(tail(sort(colSums(df[-1])), 2))])

#  id stock bear
#1  1     3    3
#2  2     1    2
#3  3     4    5

answered Feb 04 '20 at 09:57

Ronak Shah

Thank you, it seeems to be the right option. Any idea how could I handle an error like this `Error: cannot allocate vector of size 3.5 Gb`? – Nathalie Feb 04 '20 at 10:01
@Nathalie Not exactly, sure how you can optimize for the case here but this is a common problem and there are suggestions https://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session?rq=1 and https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb – Ronak Shah Feb 04 '20 at 10:08

score 2 · Answer 2 · answered Feb 04 '20 at 10:01

2

Another base R possibility could be:

cbind(df[1], df[-1][rank(-colSums(df[-1])) %in% 1:2])

  id stock bear
1  1     3    3
2  2     1    2
3  3     4    5

answered Feb 04 '20 at 10:01

tmfmnk

2 Answers2