1

I am a novice in R but I am forced to use it for some data manipulation/cleaning. I tried to look for a solution using the aggregate or the table functions but I wasn't successful. Basically, I want to group my data frame by days summing all the equal days and values of a specific column while keeping unique subsets of all the other columns.

I clarify with a simple example.

This is the starting table:

+------------+------+----------+----------+----------+--------+
|    Date    | Term | Currency | Country1 | Country2 | Volume |
+------------+------+----------+----------+----------+--------+
| 2006-01-02 | SN   | EUR      | France   | Germany  |      5 |
| 2006-01-02 | SN   | EUR      | France   | Italy    |     10 |
| 2006-01-02 | SN   | EUR      | France   | Germany  |     15 |
| 2006-01-02 | ON   | EUR      | Germany  | Italy    |     10 |
| 2006-01-03 | SN   | EUR      | France   | Germany  |     10 |
| 2006-01-03 | SN   | EUR      | France   | Italy    |      5 |
| 2006-01-03 | ON   | EUR      | France   | Germany  |      5 |
+------------+------+----------+----------+----------+--------+

The first and third rows have equal day ("Date") and are equal also for "Term", "Currency", "Country1" and "Country2". Therefore, I want to group them by "Date" and sum over "Volume".

The final table should look like this:

+------------+------+----------+----------+----------+--------+
|    Date    | Term | Currency | Country1 | Country2 | Volume |
+------------+------+----------+----------+----------+--------+
| 2006-01-02 | SN   | EUR      | France   | Germany  |     20 |
| 2006-01-02 | SN   | EUR      | France   | Italy    |     10 |
| 2006-01-02 | ON   | EUR      | Germany  | Italy    |     10 |
| 2006-01-03 | SN   | EUR      | France   | Germany  |     10 |
| 2006-01-03 | SN   | EUR      | France   | Italy    |      5 |
| 2006-01-03 | ON   | EUR      | France   | Germany  |      5 |
+------------+------+----------+----------+----------+--------+

Essentially, I want to keep all the unique combinations of "Term", "Currency", "Country1" and "Country2" for each day and sum all the values of "Volume" of that day.

Is there a simple way to do so? I managed to sum by days over "Volume" but not having the conditions of the other unique variables.

mydata <- data.frame("Date" = c('2006-01-02','2006-01-02','2006-01-02','2006-01-02','2006-01-03','2006-01-03','2006-01-03'),
                     "Term" = c('SN', 'SN', 'SN', 'ON','SN','SN','ON'), 
                     "Currency" = c('EUR','EUR','EUR','EUR','EUR','EUR','EUR'),
                     "Country1" = c('France','France','France','Germany','France','France','France'),
                     "Country2" = c('Germany','Italy','Germany','Italy','Germany','Italy','Germany'),
                     "Volume" = c('5','10','15','10','10','5','5'))

Let me know if the issue is not clear.

Thanks!

chrisprog
  • 11
  • 2

0 Answers0