-1

I have 13 data points of sales starting on 20th of Jan, on an interval of 28 days. This gives me 13 data points. How to convert this into monthly sales ( approximation) on R

Rahul
  • 19
  • 3
  • 2
    Please provide a working minimal example of your data. This way, you will get higher quality answers. – LAP Nov 07 '16 at 08:39
  • 1
    Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Jaap Nov 07 '16 at 08:51
  • In addition to the other comments which ask for a minimal working example, be more specific on the expected output, please. Do you want to simply add the sales values when the data points happen to fall in the same month? (Which will create an artificial "peak" month.) Or, do you want the sales values to be interpolated (average sales per day in given interval) and subsequently aggregated by month? For the latter, do the sales values given on a date belong to the previous period, ending on the given date? – Uwe Nov 09 '16 at 05:34

1 Answers1

1

You can e.g. combine the aggregate function and the months function in a manner like the following. The months function returns the month of a given date. The aggregate function, in this case, sums the sales over each month.

# Create some toy data
set.seed(1) 
df <- data.frame(date = as.Date("2016-01-20") + 0:12*20,
                 sales = runif(13))
head(df)
#         date      sales
#1  2016-01-20 0.26550866
#2  2016-02-09 0.37212390
#3  2016-02-29 0.57285336
#4  2016-03-20 0.90820779
#5  2016-04-09 0.20168193
#6  2016-04-29 0.89838968

# Create the month names and numbers
df$month_num <- format(df$date, format = "%m")
df$month_lex <- months(df$date)
head(df)
#         date      sales month_num month_lex
#1  2016-01-20 0.26550866        01    januar
#2  2016-02-09 0.37212390        02   februar
#3  2016-02-29 0.57285336        02   februar
#4  2016-03-20 0.90820779        03     marts
#5  2016-04-09 0.20168193        04     april
#6  2016-04-29 0.89838968        04     april

# Aggregate by month number to get correct sorting
aggregate(sales ~ month_lex*month_num, sum, data = df)
#  month_lex month_num      sales
#1    januar        01 0.26550866
#2   februar        02 0.94497726
#3     marts        03 0.90820779
#4     april        04 1.10007162
#5       maj        05 0.94467527
#6      juni        06 1.28991184
#7      juli        07 0.06178627
#8    august        08 0.38253133
#9 september        09 0.68702285

You can leave out month_lex* in the aggregate call to not get the month names passed on.

Please ignore the fact that I'm on a danish system. If you would like more specific answers, I guess you should provide a minimal data example showing your input and expect output.

EDIT: Updated to accomodate @UweBlock comments.

Anders Ellern Bilgrau
  • 9,928
  • 1
  • 30
  • 37
  • You are using only 20 days per interval instead of 28. This is why your "business year" is ending prematurely in September. – Uwe Nov 09 '16 at 05:12
  • Using `months()` for grouping is almost always a bad idea because the result is sorted alphabetically which in any language destroys the natural sequence of months. You are better off with the numerical month, the month naming according to ISO standards, e.g, 2016-09, or turning the full month name into an appropriately sorted factor. – Uwe Nov 09 '16 at 05:47
  • @UweBlock I have updated the answer regarding the sorting order. Regarding the 20 days instead of 28, this was a mistake on my part and gives slightly "incorrect" toy data, but it has no bearing real impact on the answers and solution to the problem. – Anders Ellern Bilgrau Nov 09 '16 at 10:15