I am a novice in R but I am forced to use it for some data manipulation/cleaning. I tried to look for a solution using the aggregate or the table functions but I wasn't successful. Basically, I want to group my data frame by days summing all the equal days and values of a specific column while keeping unique subsets of all the other columns.
I clarify with a simple example.
This is the starting table:
+------------+------+----------+----------+----------+--------+
| Date | Term | Currency | Country1 | Country2 | Volume |
+------------+------+----------+----------+----------+--------+
| 2006-01-02 | SN | EUR | France | Germany | 5 |
| 2006-01-02 | SN | EUR | France | Italy | 10 |
| 2006-01-02 | SN | EUR | France | Germany | 15 |
| 2006-01-02 | ON | EUR | Germany | Italy | 10 |
| 2006-01-03 | SN | EUR | France | Germany | 10 |
| 2006-01-03 | SN | EUR | France | Italy | 5 |
| 2006-01-03 | ON | EUR | France | Germany | 5 |
+------------+------+----------+----------+----------+--------+
The first and third rows have equal day ("Date") and are equal also for "Term", "Currency", "Country1" and "Country2". Therefore, I want to group them by "Date" and sum over "Volume".
The final table should look like this:
+------------+------+----------+----------+----------+--------+
| Date | Term | Currency | Country1 | Country2 | Volume |
+------------+------+----------+----------+----------+--------+
| 2006-01-02 | SN | EUR | France | Germany | 20 |
| 2006-01-02 | SN | EUR | France | Italy | 10 |
| 2006-01-02 | ON | EUR | Germany | Italy | 10 |
| 2006-01-03 | SN | EUR | France | Germany | 10 |
| 2006-01-03 | SN | EUR | France | Italy | 5 |
| 2006-01-03 | ON | EUR | France | Germany | 5 |
+------------+------+----------+----------+----------+--------+
Essentially, I want to keep all the unique combinations of "Term", "Currency", "Country1" and "Country2" for each day and sum all the values of "Volume" of that day.
Is there a simple way to do so? I managed to sum by days over "Volume" but not having the conditions of the other unique variables.
mydata <- data.frame("Date" = c('2006-01-02','2006-01-02','2006-01-02','2006-01-02','2006-01-03','2006-01-03','2006-01-03'),
"Term" = c('SN', 'SN', 'SN', 'ON','SN','SN','ON'),
"Currency" = c('EUR','EUR','EUR','EUR','EUR','EUR','EUR'),
"Country1" = c('France','France','France','Germany','France','France','France'),
"Country2" = c('Germany','Italy','Germany','Italy','Germany','Italy','Germany'),
"Volume" = c('5','10','15','10','10','5','5'))
Let me know if the issue is not clear.
Thanks!