0

I’m trying to predict sales for each of the remaining days before the close. I’m using linear regression to predict each individual day based on the prior_week_average (among other factors), which is simply the average of the last seven days that we have in the data set. However, when the only remaining day is the close day, then I want to use a different function to predict that day. I can perform this manually, but I was hoping to find a way to do it with a loop.

We will use two functions to predict the remaining data points in the data sets: close_day_lm and normal _day_lm • close_day_lm: used to predict Day0, Day0 ~ Total_Not_Including_Close Sale + Close_Day_of_Week + Month • normal_day_lm: used to predict all Day”X”, except Day0. Day”X” ~ Prior_week_average + Close_Day_of_Week + Month

The loop will need to need to tackle each value individually before moving on, so for the first NA value (Day0 for 11/26/2016), it will use the close_day_lm to predict Day0 for 11/26/2016. It will then need to add this data point to the data set, so we can use it going forward with further iterations.

The loop will then need to fill in Day2 for 11/28/2016 using the normal_day_lm. The Prior_week_average will have to adjust for all rows to be the average of day3 – day9 for each row which we will use to predict Day2 for 11/28/2016. Using normal_lm with the new Prior_week_average, we will predict the value of Day2 of 11/28/2016. Then we will add this data point to the data set.

Next, we will be predicting Day1 of 11/28/2016, we will use the same normal_day_lm function to do this, but Prior_week_average must be changed to take the average of the 7 data points prior to Day1. So Prior_week_average will now be the average of Day2 – Day8 for all rows. Now we will predict Day1 11/28/2016 using normal_day_lm and add it to the data set.

Next, we will be predicting Day0 of 11/28/2016 using the close_day_lm function. This function will use the variable total_not_including_close_sales which is the sum of all days except day0. We will use the Day0 data point from 11/26/2018. The predicted value will then be added to the dataset. Next, we will be doing the same process for 12/2/2016 until all the remaining “left” NA’s are filled.

I’ll attempt to write the what the loop needs to do in plain English here:

for all days = NA, except day0, predict(non_close_day_sales) for each day add this new prediction to the data set reconfigure prior_week_average to include the new prediction + last 6 days but if day = day0, then predict(close_day_sales) add this prediction to the data set continue until all the NA’s/blanks are filled in

Another key thing to note here is that I only want to predict the values of NA’s to the left of non-NA values. For example, I do not want to predict Day76 of 11/26/2016 because the product wasn’t on sale yet.

I appreciate your help and please let me know if there’s something I can add to clarify anything. I've added a link to the data set below.

Link to dataset

CghostK
  • 9
  • 3
  • 1
    Welcome to SO! Please read [ask] and give a [mcve] in your question! What have you tried so far? What is your R programming problem? In the current form your question is to broad. – jogo Nov 28 '18 at 15:19
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Show exactly that you've tried and where you are getting stuck. – MrFlick Nov 28 '18 at 15:19

0 Answers0