0

I've been trying to transform a dataframe I put together into a Time Series, but for some reason it doesn't work. I am very new to R.

    x<-Sales_AEMBG%>%
+   select(Ecriture.DatEcr, Crédit, Mapping)
> names(x)<-c("Dates","Revenue","Mapping")
> str(x)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   15167 obs. of  3 variables:
 $ Dates  : POSIXct, format: "2016-01-02" "2016-01-02" "2016-01-02" "2016-01-02" ...
 $ Revenue: num  124065 214631 135810 225293 57804 ...
 $ Mapping: chr  "E.M 1.5 L" "E.M 1.5 L" "E.M 1.5 L" "E.M 1.5 L" ...'

When I try to look at the data, here's what I have

> head(x)
# A tibble: 6 x 3
  Dates               Revenue Mapping  
  <dttm>                <dbl> <chr>    
1 2016-01-02 00:00:00 124065. E.M 1.5 L
2 2016-01-02 00:00:00 214631. E.M 1.5 L
3 2016-01-02 00:00:00 135810. E.M 1.5 L
4 2016-01-02 00:00:00 225293. E.M 1.5 L
5 2016-01-02 00:00:00  57804. E.M 1.5 L
6 2016-01-02 00:00:00 124065. E.M 1.5 L

Of course, I tried the as.ts function

 > x_xts <- as.ts(x)
Warning message:
In data.matrix(data) : NAs introduced by coercion
> is.ts(x)
[1] FALSE

But it keeps telling me that my dataframe is still not recognized as a TS.

What do you suggest ?

Thanks

Sofiane
  • 45
  • 7
  • A quick search here would find this: https://stackoverflow.com/questions/29046311/how-to-convert-data-frame-into-time-series – Edward Mar 02 '20 at 06:25
  • I notice that your dates are all the same, at least for the first 6 records. You'll have problems converting that data frame into a time series object... – Edward Mar 02 '20 at 07:18
  • @Edward if you see my data, you'll find that I have multiple transactions in the same day, and for the same products. So first I need to find how to sum sales by day and product type (there are 5 product types in the total dataframe) – Sofiane Mar 02 '20 at 21:41
  • OK. So do you need help with that ? – Edward Mar 02 '20 at 21:55
  • @Edward Yes I do ! – Sofiane Mar 02 '20 at 22:45

1 Answers1

0

I've added a few more observations to your data.

# A tibble: 12 x 3
   Dates               Revenue Mapping  
   <dttm>                <dbl> <chr>    
 1 2016-01-02 00:00:00  124065 E.M 1.5 L
 2 2016-01-02 00:00:00  214631 E.M 1.5 L
 3 2016-01-03 00:00:00  135810 E.M 1.5 L
 4 2016-01-03 00:00:00  225293 E.M 1.5 L
 5 2016-01-05 00:00:00   57804 E.M 1.5 L
 6 2016-01-05 00:00:00  124065 E.M 1.5 L
 7 2016-01-02 00:00:00   24065 E.M 1.5 M
 8 2016-01-02 00:00:00   14631 E.M 1.5 M
 9 2016-01-03 00:00:00   35810 E.M 1.5 M
10 2016-01-03 00:00:00   25293 E.M 1.5 M
11 2016-01-05 00:00:00    7804 E.M 1.5 M
12 2016-01-05 00:00:00   24065 E.M 1.5 M

First you need to sum the sales by day (Dates) and product type (your Mapping variable?), and pivot into a wider data format:

library(dplyr)
library(tidyr)

x.sum <- x %>%
  group_by(Mapping, Dates) %>%
  summarise(Revenue=sum(Revenue)) %>%
  pivot_wider(id_cols=Dates, names_from=Mapping, values_from=Revenue)

# A tibble: 3 x 3
  Dates               `E,M 1.5 L` `E,M 1.5 M`
  <dttm>                    <dbl>       <dbl>
1 2016-01-02 00:00:00      338696       38696
2 2016-01-03 00:00:00      361103       61103
3 2016-01-05 00:00:00      181869       31869

Note that I've deliberately omitted Jan 4.

If your time series data has missing days, such as stock prices where financial markets are closed on the weekends, then using the as.ts (or ts) function won't work. If there are no missing days, then then correct way to convert the data into a time series object ("ts") is to specify the column(s) to convert (x.sum[,2:3]) and the start (January 2, 2016) and frequency (daily) of the series.

x.ts <- ts(x.sum[,2:3], start=c(2016, 2), frequency=365)

Be careful with the start as the second argument depends on the specified frequency. Here, 365 means daily, so the "2" means day 2 of year 2016. If the frequency was monthly, the "2" would mean month 2 of year 2016.

But as I mentioned, ts doesn't ignore any missing days. So for this make-up data, if you plotted the time series, then you will get the wrong information.

In this case, other packages such as xts and zoo can be used to simply the work.

library(xts)
x.xts <- xts(x.sum[,2:3], order.by=x.sum$Dates)

plot(x.xts) # Correct results.

Other answers about time series can be found here and here.

Edward
  • 10,360
  • 2
  • 11
  • 26
  • GREAT ANSWER ! Thanks a lot Edward ! – Sofiane Mar 03 '20 at 01:54
  • Hello @Edward, I am working with your solution now. All good so far, but I can't figure out how in my case I can decompose the data to apply an ARIMA process. Since each year has a different number of days in it, I don't know how to build a summary by year to spot seasonality – Sofiane Mar 14 '20 at 05:37
  • I'm afraid you'll need to start a whole new question. Not because I don't want to help, or I can't help, but because you'll get a lot more response from others. But more importantly, this question is closed. – Edward Mar 14 '20 at 05:47