R data frame reshape, restructure, and/or merge

Question

I'm trying to reshape and "expand" data.frame based on the values contained in the data.frame. Below is the structure of the data frame that I am starting with:

Starting structure:

'data.frame':   9 obs. of  5 variables:
 $ Delivery.Location    : chr  "Henry" "Henry" "Henry" "Henry" ...
 $ Price                : num  2.97 2.96 2.91 2.85 2.89 ...
 $ Trade.Date           : Date, format: "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06" ...
 $ Delivery.Start.Date  : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-07" ...
 $ Delivery.End.Date    : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-09" ...

The market where this price data comes from is called the "next day market" since the physical delivery of natural gas is typically the day after the natural gas was traded (i.e. Trade.Date above). I emphasize typically because there are exceptions that occur on weekends and holidays in which case the delivery period might be for multiple day period (i.e. 2-3 days). However, the data structure is provided with variables that explicitly state the Delivery.Start.Date and Delivery.End.Date.

I'm trying to restructure the data.frame in the following way to produce some time series charts and do additional analysis:

Desired structure:

$ Delivery.Location
$ Trade.Date
$ Delivery.Date    <<<-- How do I create this variable? 
$ Price

How do I create the Delivery.Date variable based on both the existing Delivery.Start.Date and Delivery.End.Date variables?

In other words, the data from the 2012-01-06 Trade.Date looks like this:

Delivery Location   Price      Trade.Date      Delivery.Start.Date     Delivery.End.Date     
Henry               2.851322    2012-01-06     2012-01-07              2012-01-09

I want to somehow "fill in" Delivery.Location & Price for 2012-01-08 to get something like this:

Delivery Location     Price      Trade.Date      Delivery.Date
Henry                 2.851322    2012-01-06     2012-01-07   
Henry                 2.851322    2012-01-06     2012-01-08   <--new record "filled in"
Henry                 2.851322    2012-01-06     2012-01-09

Below is a subset example of my data.frame:

##--------------------------------------------------------------------------------------------
## sample data
##--------------------------------------------------------------------------------------------
df <- structure(list(Delivery.Location = c("Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry"), Price = c(2.96539814293754, 2.95907652120467, 2.9064360152398, 2.85132233314846, 2.89036418816388,2.9655845029802, 2.80773394495413, 2.70207160426346, 2.67173237617745),  Trade.Date = structure(c(15342, 15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352), class = "Date"), Delivery.Start.Date = structure(c(15343, 15344, 15345, 15346, 15349, 15350, 15351, 15352, 15353), class = "Date"),  Delivery.End.Date = structure(c(15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352, 15356), class = "Date")), .Names = c("Delivery.Location", "Price", "Trade.Date", "Delivery.Start.Date", "Delivery.End.Date"), row.names = c(35L, 150L, 263L, 377L, 493L, 607L, 724L, 838L, 955L), class = "data.frame")

str(df)

##--------------------------------------------------------------------------------------------   
## create sequence of Delivery.Dates to potentially use
##--------------------------------------------------------------------------------------------
rng <- range(c(range(df$Delivery.Start.Date), range(df$Delivery.End.Date)))
Delivery.Date <- seq(rng[1], rng[2], by=1)

Any assistance or general direction would be greatly appreciated.

@Metrics: I have edited my question to hopefully make it more clear. My apologies for not being more specific from the onset. — MikeTP, Jul 11 '13 at 23:36
NP; do you want the difference between start and end date for delivery? — Metrics, Jul 12 '13 at 00:10
@metrix. No don't want difference just want the price to be the same for all dates between the start date and end date — MikeTP, Jul 12 '13 at 00:24

cryo111 · Accepted Answer · 2013-07-12T01:03:55.890

You can use ddply from the plyr package

library(plyr)
ddply(
      df,
      c("Delivery.Location","Trade.Date"),
      function(trade)
      data.frame(
      trade,
      Delivery.Date=seq(
          from=trade$Delivery.Start.Date,
          to=trade$Delivery.End.Date,
          by="day")
      )
 )

Of course, you would still have to implement the logic regarding the weekends, holidays, etc.

I have also assumed that Delivery.Location and Trade.Date are sufficient to identify a single trade.

score 1 · Answer 2 · edited May 23 '17 at 12:16

Is this okay?

library(plyr)   



lookuptable<-df[,2:3]

Trade.Date<-df[,4]
filluptable1<-as.data.frame(Trade.Date)
Trade.Date<-df[,5]
filluptable2<-as.data.frame(Trade.Date)

myfillstart<- join(filluptable1, lookuptable, by = "Trade.Date")
myfillstart<- rename(myfillstart, c(Trade.Date="Delivery.Start.Date"))
myfillstart<- rename(myfillstart, c(Price="Price.Start.Date"))
myfillend<- join(filluptable2, lookuptable, by = "Trade.Date")
myfillend<- rename(myfillend, c(Trade.Date="Delivery.End.Date"))
myfillend<- rename(myfillend, c(Price="Price.End.Date"))
finaldf<-cbind(df[,1:3],myfillstart,myfillend)



finaldf
    Delivery.Location    Price Trade.Date Delivery.Start.Date Price.Start.Date Delivery.End.Date Price.End.Date
35              Henry 2.965398 2012-01-03          2012-01-04         2.959077        2012-01-04       2.959077
150             Henry 2.959077 2012-01-04          2012-01-05         2.906436        2012-01-05       2.906436
263             Henry 2.906436 2012-01-05          2012-01-06         2.851322        2012-01-06       2.851322
377             Henry 2.851322 2012-01-06          2012-01-07               NA        2012-01-09       2.890364
493             Henry 2.890364 2012-01-09          2012-01-10         2.965585        2012-01-10       2.965585
607             Henry 2.965585 2012-01-10          2012-01-11         2.807734        2012-01-11       2.807734
724             Henry 2.807734 2012-01-11          2012-01-12         2.702072        2012-01-12       2.702072
838             Henry 2.702072 2012-01-12          2012-01-13         2.671732        2012-01-13       2.671732
955             Henry 2.671732 2012-01-13          2012-01-14               NA        2012-01-17             NA

Note: Since you have the same location, I didn't look up for the location. But, you can do the same. The code looks a little bit messy. Here are the alternatives which you can also go through.

R data frame reshape, restructure, and/or merge

2 Answers2