I have a table like this:
+--------+--------+--------+--------+
| Value1 | Value2 | Value3 | Value4 |
+--------+--------+--------+--------+
| 1 | 2 | 3 | 4 |
| 1 | 2 | 3 | 4 |
| 1 | 2 | 5 | 7 |
| 2 | 7 | 5 | 6 |
| 1 | 2 | 3 | 4 |
| 8 | 5 | 1 | 3 |
| 2 | 7 | 5 | 6 |
+--------+--------+--------+--------+
I want to efficiently do this:
sum Value4 across all rows with the same value of Value1, Value2 and Value2 and create a new row (in another data table) with corresponding Value1, Value2, Value3 and sum of Value4's. So the output would be something like this:
+--------+--------+--------+--------+
| Value1 | Value2 | Value3 | Value4 |
+--------+--------+--------+--------+
| 1 | 2 | 3 | 12 |
| 1 | 2 | 5 | 7 |
| 2 | 7 | 5 | 12 |
| 8 | 5 | 1 | 3 |
+--------+--------+--------+--------+
I am actually able to do this in R, but not efficiently. Since my dataset has 24 000 000 rows with 13 variables, my code would take like 2 weeks to terminate.
edit: note that in my actual dataset, variables are not necessarily integers, but also characters.