I'm trying to reshape and "expand" data.frame based on the values contained in the data.frame. Below is the structure of the data frame that I am starting with:
Starting structure:
'data.frame': 9 obs. of 5 variables:
$ Delivery.Location : chr "Henry" "Henry" "Henry" "Henry" ...
$ Price : num 2.97 2.96 2.91 2.85 2.89 ...
$ Trade.Date : Date, format: "2012-01-03" "2012-01-04" "2012-01-05" "2012-01-06" ...
$ Delivery.Start.Date : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-07" ...
$ Delivery.End.Date : Date, format: "2012-01-04" "2012-01-05" "2012-01-06" "2012-01-09" ...
The market where this price data comes from is called the "next day market" since the physical delivery of natural gas is typically the day after the natural gas was traded (i.e. Trade.Date
above). I emphasize typically because there are exceptions that occur on weekends and holidays in which case the delivery period might be for multiple day period (i.e. 2-3 days). However, the data structure is provided with variables that explicitly state the Delivery.Start.Date
and Delivery.End.Date
.
I'm trying to restructure the data.frame in the following way to produce some time series charts and do additional analysis:
Desired structure:
$ Delivery.Location
$ Trade.Date
$ Delivery.Date <<<-- How do I create this variable?
$ Price
How do I create the Delivery.Date
variable based on both the existing Delivery.Start.Date
and Delivery.End.Date
variables?
In other words, the data from the 2012-01-06 Trade.Date looks like this:
Delivery Location Price Trade.Date Delivery.Start.Date Delivery.End.Date
Henry 2.851322 2012-01-06 2012-01-07 2012-01-09
I want to somehow "fill in" Delivery.Location & Price for 2012-01-08 to get something like this:
Delivery Location Price Trade.Date Delivery.Date
Henry 2.851322 2012-01-06 2012-01-07
Henry 2.851322 2012-01-06 2012-01-08 <--new record "filled in"
Henry 2.851322 2012-01-06 2012-01-09
Below is a subset example of my data.frame:
##--------------------------------------------------------------------------------------------
## sample data
##--------------------------------------------------------------------------------------------
df <- structure(list(Delivery.Location = c("Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry", "Henry"), Price = c(2.96539814293754, 2.95907652120467, 2.9064360152398, 2.85132233314846, 2.89036418816388,2.9655845029802, 2.80773394495413, 2.70207160426346, 2.67173237617745), Trade.Date = structure(c(15342, 15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352), class = "Date"), Delivery.Start.Date = structure(c(15343, 15344, 15345, 15346, 15349, 15350, 15351, 15352, 15353), class = "Date"), Delivery.End.Date = structure(c(15343, 15344, 15345, 15348, 15349, 15350, 15351, 15352, 15356), class = "Date")), .Names = c("Delivery.Location", "Price", "Trade.Date", "Delivery.Start.Date", "Delivery.End.Date"), row.names = c(35L, 150L, 263L, 377L, 493L, 607L, 724L, 838L, 955L), class = "data.frame")
str(df)
##--------------------------------------------------------------------------------------------
## create sequence of Delivery.Dates to potentially use
##--------------------------------------------------------------------------------------------
rng <- range(c(range(df$Delivery.Start.Date), range(df$Delivery.End.Date)))
Delivery.Date <- seq(rng[1], rng[2], by=1)
Any assistance or general direction would be greatly appreciated.