-1

I have a data set that I would like to sum up all of Action_1 and Action_2 and according to USER_IDs (eventually removing duplicates in USER_ID)

User_ID    Action_1   Action_2
user_001   1          1
user_001   1          0
user_001   0          1
user_002   1          1
user_002   0          1

...

Outcome:

User_ID   Action_1   Action_2
user_001  2          2
user_002  1          2

...

Is there a way to accomplish this in R? I do apologize in advance as I do not know the terminology of this technique.

Thanks.

New2Data
  • 43
  • 6
  • 1
    the "terminology" is `group_by` and `aggregate` . read about group by function in sql - it could be useful. – vagabond Nov 18 '16 at 05:19

3 Answers3

1
library(data.table)
df = fread("User_ID    Action_1   Action_2
       user_001   1          1
       user_001   1          0
       user_001   0          1
       user_002   1          1
       user_002   0          1")

#incase df was a data.frame in your code then use setDT() to make it data.table, which can help you with speed. 

df[,.(Action_1=sum(Action_1), Action_2=sum(Action_2)),"User_ID"]

#    User_ID a b
#1: user_001 2 2
#2: user_002 1 2

I f there are many columns and all are such numeric ones on which yo want to sum-up, then based on @thelatemail you

dat[, lapply(.SD,sum), by=User_ID]

If you’re using several but not all of the columns, you can combine .SD with .SDcols too!

sd.cols = c("Action_1", "Action_2")
dat[, lapply(.SD,sum), by=User_ID, .SDcols = sd.cols]
joel.wilson
  • 8,243
  • 5
  • 28
  • 48
0

This is the exact use case for the aggregate function:

aggregate(df[-1], by = df["User_ID"], FUN = sum)

The by parameter is a list of factors indicating groups in the first element that the function FUN should be applied to.

Or using the formula interface:

aggregate(. ~ User_ID, df, FUN = sum)
Barker
  • 2,074
  • 2
  • 17
  • 31
0

Considering input as:

structure(list(User_Id = structure(c(1L, 1L, 1L, 2L, 2L), 
.Label = c("user_001", "user_002"), class = "factor"), 
Action_1 = c(1, 1, 0, 1, 0), Action_2 = c(1, 0, 1, 1, 1)), 
.Names = c("User_Id", "Action_1", "Action_2"), 
row.names = c(NA, -5L), class = "data.frame")

You can try tidyverse as:

df_test %>%
  group_by(User_Id) %>%
  summarise( Action_1 = sum(Action_1), Action_2 = sum(Action_2))
Aramis7d
  • 2,444
  • 19
  • 25