0

I'm dealing with a dataset df consisting in daily sales of different products, in a 2 year span.

Unfortunately I have missing observations, as well as days in which some (actually many) products where not sold (considering I'm inspecting the sales of more than 2000 products). So this is a sample of my data, number of sales per date for a given category of products. As you can see, '2014-01-09' and '2014-01-15' are missing.

  date     number
2014-01-06  1439
2014-01-07   985
2014-01-08  1202
2014-01-10  1439
2014-01-11  2862
2014-01-12  1542
2014-01-13   990
2014-01-14   562
2014-01-16  1254
2014-01-17  1419
2014-01-18  2667
2014-01-19  1513

Anyway, I create a ts object: ts <- xts(number,date,by=1) and simply plotted it plot(ts,xlab='',ylab='sales'):

1

As you can see there are a huge number of missing observations, especially in the span april/july 2014.

However, when I use: 'plot.ts(ts)' I get:

2

First of all there are 618 observations, which is less then the 2 year span (729) days I am considering. But as in the plot before you could clearly see the gaps given by the missing values, here it's like I have a period of 618 days with no missing observations.

My issue is that I have a huge data of millions of rows and thousands of products, so I would like to work just on the output without introducing missing dates together with NAs in the 'number' column.

Could you please help me with:

  1. Plot the series in a given timeframe like '2015-03-04'-'2016-03-04' .
  2. Adding the xlabels to my second plot, considering gaps as in the first one (I want 729 days of span, even if I have 618 observations).

Thank you in advance.

Tommaso Guerrini
  • 1,499
  • 5
  • 17
  • 33
  • What's wrong with using the first plot command? – r2evans Oct 15 '16 at 14:31
  • When I want to plot the ts in a given range, like start=c(2014-05-05), end=c(2014-09-05) nothing changes: 1. I tried to initialize differently ts, putting the given range in xts but no changes in the output; 2. Put it into plot(), but still I get the whole time series – Tommaso Guerrini Oct 15 '16 at 14:39
  • Using base R you could subset the dataframe you use for the plot. – Haboryme Oct 15 '16 at 14:49
  • Doing that for any product I study would be at least inefficient, as I wrote. – Tommaso Guerrini Oct 15 '16 at 14:52
  • I can't find documentation indicating you can do `start` and `end` in `xts` or `plot.xts`. I'm not a `ts` guru, so where are you finding that you can do that? (You might need to do manual subsetting, perhaps http://stackoverflow.com/questions/14101694/subsetting-in-xts-using-a-parameter-holding-dates helps?) – r2evans Oct 15 '16 at 14:54
  • That is, based on the data you provided, `ts["2014-01-10/2014-01-16"]` is meaningful. – r2evans Oct 15 '16 at 14:55
  • Thank you @r2evans ! That's the answer! Put it as answer and we're done :) – Tommaso Guerrini Oct 15 '16 at 15:10

0 Answers0