I have 13 data points of sales starting on 20th of Jan, on an interval of 28 days. This gives me 13 data points. How to convert this into monthly sales ( approximation) on R
Asked
Active
Viewed 54 times
-1
-
2Please provide a working minimal example of your data. This way, you will get higher quality answers. – LAP Nov 07 '16 at 08:39
-
1Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Jaap Nov 07 '16 at 08:51
-
In addition to the other comments which ask for a minimal working example, be more specific on the expected output, please. Do you want to simply add the sales values when the data points happen to fall in the same month? (Which will create an artificial "peak" month.) Or, do you want the sales values to be interpolated (average sales per day in given interval) and subsequently aggregated by month? For the latter, do the sales values given on a date belong to the previous period, ending on the given date? – Uwe Nov 09 '16 at 05:34
1 Answers
1
You can e.g. combine the aggregate
function and the months
function in a manner like the following. The months
function returns the month of a given date. The aggregate
function, in this case, sums the sales over each month.
# Create some toy data
set.seed(1)
df <- data.frame(date = as.Date("2016-01-20") + 0:12*20,
sales = runif(13))
head(df)
# date sales
#1 2016-01-20 0.26550866
#2 2016-02-09 0.37212390
#3 2016-02-29 0.57285336
#4 2016-03-20 0.90820779
#5 2016-04-09 0.20168193
#6 2016-04-29 0.89838968
# Create the month names and numbers
df$month_num <- format(df$date, format = "%m")
df$month_lex <- months(df$date)
head(df)
# date sales month_num month_lex
#1 2016-01-20 0.26550866 01 januar
#2 2016-02-09 0.37212390 02 februar
#3 2016-02-29 0.57285336 02 februar
#4 2016-03-20 0.90820779 03 marts
#5 2016-04-09 0.20168193 04 april
#6 2016-04-29 0.89838968 04 april
# Aggregate by month number to get correct sorting
aggregate(sales ~ month_lex*month_num, sum, data = df)
# month_lex month_num sales
#1 januar 01 0.26550866
#2 februar 02 0.94497726
#3 marts 03 0.90820779
#4 april 04 1.10007162
#5 maj 05 0.94467527
#6 juni 06 1.28991184
#7 juli 07 0.06178627
#8 august 08 0.38253133
#9 september 09 0.68702285
You can leave out month_lex*
in the aggregate
call to not get the month names passed on.
Please ignore the fact that I'm on a danish system. If you would like more specific answers, I guess you should provide a minimal data example showing your input and expect output.
EDIT: Updated to accomodate @UweBlock comments.

Anders Ellern Bilgrau
- 9,928
- 1
- 30
- 37
-
You are using only 20 days per interval instead of 28. This is why your "business year" is ending prematurely in September. – Uwe Nov 09 '16 at 05:12
-
Using `months()` for grouping is almost always a bad idea because the result is sorted alphabetically which in any language destroys the natural sequence of months. You are better off with the numerical month, the month naming according to ISO standards, e.g, 2016-09, or turning the full month name into an appropriately sorted factor. – Uwe Nov 09 '16 at 05:47
-
@UweBlock I have updated the answer regarding the sorting order. Regarding the 20 days instead of 28, this was a mistake on my part and gives slightly "incorrect" toy data, but it has no bearing real impact on the answers and solution to the problem. – Anders Ellern Bilgrau Nov 09 '16 at 10:15