0

I am new to programming and am attempting to create a prediction model for multiple articles. Unfortunately, using Excel or similar software is not possible for this task. Therefore, I have installed Rstudio to solve this problem. My goal is to make a 18-month prediction for each article in my dataset using an ARIMA model.

However, I am currently facing an issue with the format of my data frame. Specifically, I am unsure of how my CSV should be structured to be read by my code.

I have attached an image of my current dataset in CSV format : https://i.stack.imgur.com/AQJx1.png

Here is my dput(sales_data) : structure(list(X.Article.1.Article.2.Article.3 = c("janv-19;42;49;55", "f\xe9vr-19;56;58;38", "mars-19;55;59;76")), class = "data.frame", row.names = c(NA, -3L))

And also provided the code I have constructed so far with the help of blogs and websites :

library(forecast)
library(reshape2)

sales_data <- read.csv("sales_data.csv", header = TRUE)

sales_data_long <- reshape2::melt(sales_data, id.vars = "Code Article")

for(i in 1:nrow(sales_data_long)) {
  
  sales_data_article <- subset(sales_data_long, sales_data_long$`Code Article` == sales_data_long[i,"Code Article"])
  
  sales_ts <- ts(sales_data_article$value, start = c(2010,6), frequency = 12)
  
  arima_fit <- auto

  arima_forecast <- forecast(arima_fit, h = 18)
  
  print(arima_forecast)
  print("Article: ", Code article[i])
}

With this code, RStudio gives me the following error : "Error: id variables not found in data: Code Article"

Currently, I am not interested in generating any plots or outputs. My main focus is on identifying the appropriate format for my data.

Do I need to modify my CSV file and separate each column using "," or ";"? Or, can I keep my data in its current format and make adjustments in the code instead?

Mekags
  • 1
  • 1
  • 1
    Can you provide `dput(sales_data)` instead of an image so your post is [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – jrcalabrese Jan 26 '23 at 16:25
  • Thank you for your reply. Here is my dput(sales-data) with 3 articles on 3 periods with real sales : structure(list(X.Article.1.Article.2.Article.3 = c("janv-19;490;640;90", "f\xe9vr-19;390;670;160", "mars-19;360;730;160")), class = "data.frame", row.names = c(NA, -3L)) – Mekags Jan 27 '23 at 08:05

1 Answers1

0

Added the dput output as per jrcalabrese request. Swapped to the replacement for reshape2 (tidyr). Used pivot_longer. Now doesn't give error, which was happening in reshape2::melt. It doesn't matter so much what the csv structure is. Your structure was fine. Hope this helps! :-)

library(tidyr)
sales_data <- structure(list(var1 = c("Article 1", "Article 2", "Article 3"),
`janv-19` = c(42, 56, 55),
`fev-19` = c(49, 58, 59),
`mars-19` = c(55, 38, 76)),
row.names = c(NA, 3L), class = "data.frame")

sales_data_long <- sales_data |> pivot_longer(!var1,
                                              names_to = "month",
                                              values_to = "count")
Isaiah
  • 2,091
  • 3
  • 19
  • 28
  • Hi, thank you very much for your kind help. Can your example fit a file with 100 items (100 columns) and 48 periods (48 rows)? For example, is it possible to do something like this : `library(tidyr) sales_data <- structure(list(var1 = read.csv("sales_data.csv", header = TRUE) sales_data_long <- sales_data |> pivot_longer(!var1, names_to = "month", values_to = "count")` – Mekags Jan 27 '23 at 08:20
  • `sales_data <- read.csv("sales_data.csv", header = TRUE)` is the first thing I'd try. The `structure` call is just to make the code reproducible, for those without `sales_data.csv` – Isaiah Jan 27 '23 at 09:38
  • I much prefer [readr](https://www.tidyverse.org/blog/2021/07/readr-2-0-0/) and its function `read_csv` to `read.csv`. – Isaiah Jan 27 '23 at 09:51