0

Having a data.frame like this:

data.frame(id = c(1,2,3), stock = c(3,1,4), bill = c(1,0,1), bear = c(3,2,5))

How is it possible to sum all columns expect id column and keep the first two columns which have higher frequency.

Example of expected output

data.frame(id = c(1,2,3), stock = c(3,1,4), bear = c(3,2,5))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
Nathalie
  • 1,228
  • 7
  • 20

2 Answers2

3

In base R, we can use colSums to sum the columns, sort them and select the name of 2 highest valued columns using tail.

cbind(df[1], df[names(tail(sort(colSums(df[-1])), 2))])

#  id stock bear
#1  1     3    3
#2  2     1    2
#3  3     4    5
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you, it seeems to be the right option. Any idea how could I handle an error like this `Error: cannot allocate vector of size 3.5 Gb`? – Nathalie Feb 04 '20 at 10:01
  • @Nathalie Not exactly, sure how you can optimize for the case here but this is a common problem and there are suggestions https://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session?rq=1 and https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb – Ronak Shah Feb 04 '20 at 10:08
2

Another base R possibility could be:

cbind(df[1], df[-1][rank(-colSums(df[-1])) %in% 1:2])

  id stock bear
1  1     3    3
2  2     1    2
3  3     4    5
tmfmnk
  • 38,881
  • 4
  • 47
  • 67