4

I have a dataset "rates" as below:

       Date  Price
1 2012-11-01 6.2411
2 2012-11-02 6.2415
3 2012-11-05 6.2454
4 2012-11-06 6.2456
5 2012-11-07 6.2437
6 2012-11-08 6.2429

> class(rate)
[1] "data.frame"

And try to transfer this dataset in to time series data by using R code:

rate<-ts(data = rate, start =1, freq=1)
> class(rate)
[1] "mts"    "ts"     "matrix"

> head(rate)
     Date  Price
[1,] 15645 6.2411
[2,] 15646 6.2415
[3,] 15649 6.2454
[4,] 15650 6.2456
[5,] 15651 6.2437
[6,] 15652 6.2429

As you can see, the dates become numbers. Thus I use the as.date() function:

rate[,1] <- as.Date(rate[,1],origin = "1899-12-30")
> head(rate)
       Date  Price
[1,] -719162 6.2411
[2,] -718797 6.2415
[3,] -718432 6.2454
[4,] -718067 6.2456
[5,] -717701 6.2437
[6,] -717336 6.2429

Is there anyone could help me with this problem? Thank you.

Yanyan
  • 189
  • 1
  • 4
  • 14
  • You can use the `xts` package i..e `library(xts); xts(Date = rate[,1], order.by = as.Date(rate[,1]))` – akrun Nov 29 '16 at 06:21
  • Please remember to use `dput()` to share your data in the future – Hack-R Nov 29 '16 at 06:24
  • In many cases, the Date data is better excluded from the analyses. "Date" data are just labels, nothing else. If someone else uses other labels (different days), s/he will get the same results after analysis as long as the data of other variables are the same. Especially, in these irregularly dated data, it is very useful to assign the Date column as the rownames of the dataframe you are using. Otherwise (if you keep Date column in your dataframe), you should exclude it whenever you make a critical operation. For instance, analyzing whether Date column is stationary is meaningless. – Erdogan CEVHER Nov 29 '16 at 14:02
  • Don't forget to accept the best answer by clicking the grey check mark under the downvote button – acylam Oct 02 '17 at 18:40

2 Answers2

2

The problem with your code is that you tried to convert the Date column to type date after you coerced the whole dataframe to a time series matrix. The proper way of doing this is to first convert "Date" to type Date, calculate starting date in terms of daily increments from starting year (2012), then use that information to convert "Price" column to time series.

# Here is your data in "dput" form
rate = structure(list(Date = c("2012-11-01", "2012-11-02", "2012-11-05", 
                           "2012-11-06", "2012-11-07", "2012-11-08"), 
                  Price = c(6.2411, 6.2415, 6.2454, 6.2456, 6.2437, 6.2429)), 
             .Names = c("Date", "Price"), class = "data.frame", row.names = c(NA, -6L))

# Convert Date column to type "Date"
rate$Date = as.Date(rate$Date, format = "%Y-%m-%d")

# Convert "11-01" to day of the year 
dayOfYear = as.numeric(format(rate[1,1], "%j"))

# Use 2012 and dayOfYear as starting date
rate_ts = ts(rate$Price, start = c(2012, dayOfYear), frequency = 365)

> class(rate_ts)
[1] "ts"

> rate_ts
Time Series:
Start = c(2012, 306) 
End = c(2012, 311) 
Frequency = 365 
[1] 6.2411 6.2415 6.2454 6.2456 6.2437 6.2429

Here, "%j" simply tells the format.Date function to convert the first element of Date (2012-11-01) to day of the year.

I would also like to point out that since your ts is daily, you should use frequency = 365 instead of frequency = 1.

Plotting

# Plot time series without x-axis
plot(rate_ts, ylab = "Price", xaxt = "n")

# Extract first and last date value of rate_ts
tsp = attributes(rate_ts)$tsp

# Plot x-axis
axis(1, at = seq(tsp[1], tsp[2], along = rate_ts), 
     labels = format(rate$Date, "%Y-%m-%d"))

This last line allows you to format the x-axis however you want by changing the second argument of format(). The at = argument allows you to specify the ticks.

enter image description here

Credits to Jake Burkhead for plotting method in this answer

Community
  • 1
  • 1
acylam
  • 18,231
  • 5
  • 36
  • 45
  • The `ts` data is daily, but _irregularly_ daily. My solution is robust to _any_ irregularity besides the classical "absence of weekend data" irregularity. – Erdogan CEVHER Nov 29 '16 at 07:47
  • @ErdoganCEVHER Op simply asked for a way to convert his dataframe to a time series object. I provided that solution and pointed out likely causes of his error. On the other hand, no where in your answer did you provide _any_ code to solve op's issue. Talking about robustness when the op asked for a simple conversion seems off-topic to me. Furthermore, while `ts()` would automatically coerce his ts, the default setting by no means coerce it correctly. And you did not state how it should be done. – acylam Nov 29 '16 at 07:55
  • The OP has no necessity to use `ts()` for coercing operations in R's functions. Assume, for example, the OP will perform stationarity analysis of his time series. Then, `library(fUnitRoots); unitrootTest(df[,1])` automatically performs the job. The coercing is achieved _internally_ within the R function `unitrootTest`. What I mean is that there are many R functions that handle automatic coercing. That's why, the OP has almost _nothing_ about class conversions. Even he can perform very sophisticated operations without any handling on `df[,1]`. – Erdogan CEVHER Nov 29 '16 at 08:09
  • @ErdoganCEVHER Sure. Don't get me wrong, I agree that there are tons of ways of doing the same thing in R. However, in _this_ particular question, the op is interested in converting a `data.frame` to a `ts` object (Title clearly says: "Transfer daily “data.frame” to “ts” in R"). Whether he _wants_ to convert it to a `ts` object that is up to him. – acylam Nov 29 '16 at 08:14
  • Thank you for helping, but still have problems. By following your steps, after : 'rate$Price = ts(rate$Price, start = c(2012, dayOfYear), frequency = 365)', the 'class(rate)' is still 'data.frame'. And if I use 'rate= ts(data=rate, start = c(2012, dayOfYear), frequency = 365)', the result will be the same as I got in the question. Could you please check again? – Yanyan Nov 29 '16 at 13:45
  • @Yanyan `class(rate)` would still be a data.frame because it is a data.frame with a "Date" column and a "Time-Series" column. Try `class(rate$Price)` instead, after `rate$Price = ts(rate$Price, start = c(2012, dayOfYear), frequency = 365)`. – acylam Nov 29 '16 at 16:54
  • @Yanyan I edited my answer to store only the new `rate$Price` column as a time series. Now `class(rate_ts)` is of type "ts". – acylam Nov 29 '16 at 23:50
  • @useR Yes, I got it now. Thank you. I asked for the class of date is that I try to plot these date on the x-axis. By using your steps, the contents on x-axis are numbers, 100, 200, ..... – Yanyan Nov 30 '16 at 18:17
  • @Yanyan Although I feel that this has exceeded the scope of the question, I have edited my answer to give a simple example of how to plot the ts with correct labels. – acylam Nov 30 '16 at 23:39
0
df <- data.frame(c(6.2411, 6.2415, 6.2454, 6.2456, 6.2437, 6.2429))
names(df) <- "price"
rownames(df) <- c("2012-11-01", "2012-11-02", "2012-11-05", "2012-11-06", "2012-11-07", "2012-11-08")
df
#            price
# 2012-11-01 6.2411
# 2012-11-02 6.2415
# 2012-11-05 6.2454
# 2012-11-06 6.2456
# 2012-11-07 6.2437
# 2012-11-08 6.2429

You are trying to assign irregular days. Hence, lubridate may not be recite to your problem. When you use df[,1] for your further operations that needs a ts object, df[,1] will be automatically coerced to this class. You can perform any analysis you wish with df[,1].

Please note that: The observation points are non-equally (unevenly/irregularly) spaced (national holidays etc. may result in the difference in the number of observations). You may thought the intersections when multiple series are in consideration along with price. You should neglect the effect of the irregularity in such a case since the spacing of the observations would be the same for many observations, and therefore not so highly irregular and also consider the fact that transforming the data into equally spaced observations using linear interpolation can introduce a number of significant and hard to quantify biases (See: Scholes and Williams).

M. Scholes and J. Williams, “Estimating betas from nonsynchronous data”, Journal of Financial Economics 5: 309–327, 1977.

Generalization over Excel2016:
Assume you have thousands of dates and price data in Excel file (DatePrice.xlsx):

   A   B    
1 Date Price    
2 2012-11-01 6.2411
3 2012-11-02 6.2415
...

Then, do the following:

library(readxl)
# Use the path returned from getwd() function that is R's working directory
df <- as.data.frame(read_excel("C://Users//User//Documents//Revolution//DatePrice.xlsx"))
names(df) <- c("date","price")
rownames(df) <- df[,1]
df[,1] <- NULL
df

Again, df[,1] will be the time series that will be used in coercing operations in any further analysis. For example;

An exemplary analysis of how to proceed with the solution:

price <- df[,1]
plot(ts(price)); abline(a=mean(ts(price)), b=0) # graphically, price~ I(1)

#Stationarity analysis (even with 6 obs, it produces results!)
library(fUnitRoots); unitrootTest(price) # formally, price~ I(1)  p=0.6889

plot(diff(ts(price), differences=1)) # graphically, Delta(price) ~ I(0)
unitrootTest(diff(ts(price), differences=1)) # formally, Delta(price) ~ I(0) p=1e-04<0.05
Erdogan CEVHER
  • 1,788
  • 1
  • 21
  • 40
  • The way you showed it, `df[,1]` is just the Date column, not a time series. Even if you meant `df[,2]`, it is still _not_ of type "Time-Series" in R. From your code, the Price column would just be "numeric". Check `class(df[,2])`. – acylam Nov 29 '16 at 08:06
  • In _both_ of the solutions (non-Excel, Excel), `df[,1]` is the Price series, not the Date column. You are missing "row assignment of dates and using only Price series" in non-Excel sol. and missing "df[,1] <- NULL" assignment in Excel sol. `class(df[,1])` is "numeric" in both of the solutions. But this has _nothing_ to do with the problem type he handles. See `library(fUnitRoots); unitrootTest(df[,1])`. No `ts()` was needed! The OP's problem is particular in that it uses _irregular_ dates. For this type of problem, the solution above is very very practical. – Erdogan CEVHER Nov 29 '16 at 08:19
  • I wonder whether the down-voter really understand what the scope of my answer is... – Erdogan CEVHER Aug 25 '17 at 12:15