0

I am trying to predict day-to-day number of customers coming into store by using ARIMA model.

I am currently using R to build this model. However, the data I have is inconsistent. Attached picture is an example of my problem.

For this example, I have 4 customers coming into store in 3/14 and 3 customers in 3/13, 0 customer in 3/12,3/11. Since I want to predict number of customers coming into store, I will group the data by date. If I group by date, I will have to inset 0 customers for 3/12, 3/11 because they are not in my database. Problem is: 1. I am not sure how to how to automatically insert missing date in R. 2. Will that impact the accuracy of my model? 3. In this case, would I get better result to predict week by week instead of day by day? Thanks

enter image description here

Anyone know what should I do? Can I still make prediction on day-to-day base? Is there any way I could fix this in R?

Bonjuga Lewis
  • 103
  • 1
  • 6
  • ?`na.approx` from package `zoo` – Pierre Lapointe Jul 03 '17 at 20:00
  • On the other hand, you seem to have multiple entries for the 14th. From what I see, you have 4 customers on the 10th, 0 on the 11th, 0 on the 12th, 3 on the 13th, and 4 on the 14th. – Sci Prog Jul 04 '17 at 02:53
  • Use `Date` class objects, not just strings. You can use `seq.Date` to get a complete list of days from your min day to your max date, then `merge` to make sure you have 1 row per day. Examples (possible dupes?) [here](https://stackoverflow.com/q/29152654/903061) and [here](https://stackoverflow.com/q/39525750/903061) – Gregor Thomas Jul 06 '17 at 16:20

0 Answers0