Following problem is causing me a real bad headache.
I have a big dataset that looks like this.
Name Date C1 C2 C3 C4 C5 C6 C7
A 2008-01-03 100
A 2008-01-05 NA
A 2008-01-07 120
A 2008-02-03 NA
A 2008-03-10 50
A 2008-07-14 70
A 2008-07-15 NA
A 2009-01-03 40
A 2009-01-05 NA
A 2010-01-07 NA
A 2010-03-03 30
A 2010-03-10 20
A 2011-07-14 10
A 2011-07-15 NA
B 2008-01-03 NA
B 2008-01-05 5
B 2008-01-07 3
B 2008-02-03 11
B 2008-03-10 13
B 2008-07-14 ....
As you can see, there are a lot of NAs in my observations. The other columns look similar and the dataset has +100.000 rows. So its huge.
What I want to do is, I want aggregate my data the following way. For example C1: I want to build the monthly average for each Name and for each year and each month in a timeframe from like 2000-01 until 2012-12.
The monthly average should be calculated using the dates from each month which are available.
When the calculations are done, my dataset should look like this.
Name Date C1 C2 C3 C4 C5 C6 C7
A 2008-01 monthly average
A 2008-02 monthly average
A 2008-03 monthly average
A 2008-04 monthly average
A 2008-05 monthly average
A 2008-06 monthly average
A 2008-07 monthly average
A 2008-08 monthly average
A 2008-09 monthly average
A 2008-10 monthly average
A 2008-11 monthly average
A 2008-12 monthly average
A 2009-01 monthly average
B 2008-01 monthly average
B 2008-02 monthly average
B 2008-03 monthly average
B 2008-04 monthly average
B 2008-05 monthly average
B 2008-06 ....
So my output data should show for each name each month of the year. And the values are either NA if the month had only NA-Values or they are the monthly average of this certain month.
For example:
Name Date C1
A 2008-01-03 100
A 2008-01-05 NA
A 2008-01-07 120
Here we would expect:
Name Date C1
A 2008-01 (100+120)/2 = 110
For example:
Name Date C1
A 2008-01-03 NA
A 2008-01-05 NA
A 2008-01-07 NA
Here we would expect:
Name Date C1
A 2008-01 NA
For example:
Name Date C1
A 2008-01-03 100
A 2008-01-05 50
A 2008-01-07 120
Here we would expect:
Name Date C1
A 2008-01 (100+50+120)/3 = 90
As I am relatively new to r and I dont know how to solve this, I am hoping to find someone who can tackle this and show me how something like this can be solved. I would be really thankful for your support :)