-1

i obtained data in RDBMS with SQL and want to forecast the daily purchase using R.

Here is the first 12 rows of the data . first 12 data

What i want to make is store the dataframe like in the image below, and in the end i will try to write function to Forecast it by each item title which is in the rows using exponential smoothing. Purpose of dataframe

So far, i have succesfully done the title column. But i cannot make multiple date column exactly like the 2nd image above. Here is the code so far :

df1 <- data.frame() 
dailydate <- as.Date(as.POSIXct(data$date_placed))
newdate <- unique(dailydate)
itemtitle <- as.character(data$title)
newitemtitle <- unique(itemtitle)
df1 <- data.frame(newitemtitle,t(dailydate))
Error in data.frame(newitemtitle, t(dailydate))

I cannot add new column into df1 ,and also not yet find the way to match the daily quantity based on the title. I am open to any suggestion with this problem

user3292755
  • 383
  • 2
  • 9
  • 25

3 Answers3

2

This is a good place to use the reshape2 package.

df1 <- structure(list(title = structure(c(5L, 3L, 6L, 1L, 7L, 2L, 1L, 
4L, 8L, 3L), .Label = c("d", "k", "m", "n", "q", "t", "u", "v"
), class = "factor"), quantity = c(4L, 3L, 5L, 10L, 6L, 13L, 
4L, 6L, 12L, 1L), date_placed = structure(c(1L, 1L, 1L, 2L, 2L, 
3L, 3L, 4L, 5L, 5L), .Label = c("8/24/2013", "8/25/2013", "8/26/2013", 
"8/27/2013", "8/28/2013"), class = "factor")), .Names = c("title", 
"quantity", "date_placed"), row.names = c(NA, -10L), class = "data.frame")

#install.packages("reshape2")
reshape2:::dcast(df1, title ~ date_placed, value.var = "quantity", fill = 0)

Result:

#  title 8/24/2013 8/25/2013 8/26/2013 8/27/2013 8/28/2013
#1     d         0        10         4         0         0
#2     k         0         0        13         0         0
#3     m         3         0         0         0         1
#4     n         0         0         0         6         0
#5     q         4         0         0         0         0
#6     t         5         0         0         0         0
#7     u         0         6         0         0         0
#8     v         0         0         0         0        12

The benefit of this over the other answer is that the output is a data.frame that can now be manipulated as you wish, instead of a table.

Chrisss
  • 3,211
  • 1
  • 16
  • 13
  • thanks for the library @Chrisss , i have used the ``reshape2`` package using ``dcast`` and ``melt`` function. However, there is one more obstacle. Now i have 3 columns (``title``,``quantity``,``date_placed``) where titles are duplicated but with different ``date_placed`` and ``quantity`` . How am i possible to do forecast with duplicated ``title`` rows ? – user3292755 Sep 25 '16 at 07:00
  • You should really use `dput()` on your data to help us reproduce your problem if you want a more detailed solution. As of now, I don't understand the problem. `dcast` is making a wide data.frame with the unique values of `title` as rows and the unique values of `date_placed` as the columns and the cells are populated by `quantity`. `title` should not be duplicated anywhere in the final data.frame – Chrisss Sep 25 '16 at 08:18
  • yes,i have recently use `dput()` for R, and have trouble in forecasting the `unique` title value – user3292755 Sep 25 '16 at 10:18
2

Another option is spread from tidyr

library(tidyr)
spread(df1, date_placed, quantity, fill = 0)
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Use this to transform your data

xtabs(data = df1,quantity~title+date_placed)

Data

df1 <- structure(list(title = structure(c(5L, 3L, 6L, 1L, 7L, 2L, 1L, 
4L, 8L, 3L), .Label = c("d", "k", "m", "n", "q", "t", "u", "v"
), class = "factor"), quantity = c(4L, 3L, 5L, 10L, 6L, 13L, 
4L, 6L, 12L, 1L), date_placed = structure(c(1L, 1L, 1L, 2L, 2L, 
3L, 3L, 4L, 5L, 5L), .Label = c("8/24/2013", "8/25/2013", "8/26/2013", 
"8/27/2013", "8/28/2013"), class = "factor")), .Names = c("title", 
"quantity", "date_placed"), row.names = c(NA, -10L), class = "data.frame")
user2100721
  • 3,557
  • 2
  • 20
  • 29