0

I have data for stocks, that I have shortened the time frame of in R, but when I go to graph it, the plot still shows blank values and dates I have removed. I also was trying to get R to read my date column as dates but it was not working. I noticed the problem with the dates when I was trying to run my ACF's and linear models.

> sp<-read.csv(file="~/Desktop/SP500.csv", header=T, sep=",",check.names=FALSE)
> sfshort=sp[1428:2049,]
> plot(sfshort)
> sfshort
           Date   Value
1428 2005-01-07 1186.19
1429 2005-01-14 1184.52
1430 2005-01-21 1167.87
1431 2005-01-28 1171.36
1432 2005-02-04 1203.03
1433 2005-02-11 1205.30
1434 2005-02-18 1201.59
1435 2005-02-25 1211.37
1436 2005-03-04 1222.12
1437 2005-03-11 1200.08
1438 2005-03-18 1189.65
1439 2005-03-24 1171.42
1440 2005-04-01 1172.92
1441 2005-04-08 1181.20
1442 2005-04-15 1142.62

First few points are listed above. Here is the code that I used to try and convert the date column. I have used this in older projects so I am not sure why it is not working.

 sfshort <- as.Date(spfshort$Date, "%m/%d/%y")

The date in my excel file was listed as MM/DD/YYYY. Here is what I was seeing when I went to run the linear model.

> lm(sfshort$Value~sfshort$Date)

Call:
lm(formula = sfshort$Value ~ sfshort$Date)

Coefficients:
           (Intercept)  sfshort$Date2005-01-14  sfshort$Date2005-01-21  
               1186.19                   -1.67                  -18.32  
sfshort$Date2005-01-28  sfshort$Date2005-02-04  sfshort$Date2005-02-11  
                -14.83                   16.84                   19.11  
jay.sf
  • 60,139
  • 8
  • 53
  • 110
subrinarafiq
  • 59
  • 1
  • 8
  • Try: `sfshort <- as.Date(sfshort$Date, "%Y-%m-%d")`. See `?strptime` for the correct formats you should use. – Edward Mar 02 '20 at 03:58
  • And if the data source in an Excel file, I strongly suggest that you use the `read_excel` function from the __readxl__ package. It is more convenient for importing data that contain dates. – Edward Mar 02 '20 at 04:02
  • @Edward It is a csv file and when I run the code, both yours and mine it just rewrite the entire data with just dates, instead of just the "Date" column. This is what I see when I call sfshort again, [1] "2005-01-07" "2005-01-14" "2005-01-21" "2005-01-28" "2005-02-04" – subrinarafiq Mar 02 '20 at 04:04
  • Yes, but your `read.csv()` command did not specify the "class" for the date column ("Date"), so you need to tell R that the data is in fact a "Date" class object. It may look like a date, but it is not. So either convert it into a true date using the correct `format` argument of `as.Date()`, use the `colClasses` argument in `read.csv()`, or use `read_excel()`. – Edward Mar 02 '20 at 04:07
  • You said: "The data in my Excel file...". Was that true or not? – Edward Mar 02 '20 at 04:08
  • What @Edward most likely meant is `sfshort$Date <- as.Date(sfshort$Date, "%Y-%m-%d")` to just convert the Date column. – jay.sf Mar 02 '20 at 04:57
  • 1
    Yeah. My bad. Forgot to include `$Date`. >. – Edward Mar 02 '20 at 06:11
  • For the linear model, including "Date" as a factor seems wrong, or at best ill-advised. Maybe convert it to the number of days since "whenever" where "whenever" could be the first date and fit various types of models depending on the relationship you see from the graph, or better, convert to a time series.... but that's a bit off-topic. – Edward Mar 02 '20 at 06:15
  • And there's just been a post on this very topic! Oh the irony (or is it just coincidence?) https://stackoverflow.com/questions/60483201/transforming-a-dataframe-into-a-ts-in-r – Edward Mar 02 '20 at 06:17
  • 1
    Thanks! I see how I was just rewriting the entire file instead of just the Date columns, doing @jay.sf code fixed majority of the problems I was having. I also converted it into a time series which also helped the way everything was running. – subrinarafiq Mar 03 '20 at 03:56

1 Answers1

0

The plotting problem arises from the fact that your Date column is most likely of class "factor". As mentioned in comments you need to convert to date, or even better POSIXlt., where I recommend using strptime. (Or do it right away while reading the data.)

Here a small demonstration. I read in your example data text, which is similar to what read.csv does. For sake of comparison I'll create three data sets sfshort.1, sfshort.2, sfshort.3.

sfshort.1 <- read.table(text=
"          Date   Value
1428 2005-01-07 1186.19
1429 2005-01-14 1184.52
1430 2005-01-21 1167.87
1431 2005-01-28 1171.36
1432 2005-02-04 1203.03
1433 2005-02-11 1205.30
1434 2005-02-18 1201.59
1435 2005-02-25 1211.37
1436 2005-03-04 1222.12
1437 2005-03-11 1200.08
1438 2005-03-18 1189.65
1439 2005-03-24 1171.42
1440 2005-04-01 1172.92
1441 2005-04-08 1181.20
1442 2005-04-15 1142.62")

Now, since it neither looks like integer or floating point numbers, the Date column is read in as factor by default:

str(sfshort.1)
# 'data.frame': 15 obs. of  2 variables:
# $ Date : Factor w/ 15 levels "2005-01-07","2005-01-14",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ Value: num  1186 1185 1168 1171 1203 ...

The factor levels should be identical to the unique dates.

identical(unique(as.character(sfshort.1$Date)), levels(sfshort.1$Date))
# [1] TRUE

Let's remove the first row, and look what happens.

sfshort.2 <- sfshort.1[-1, ]
identical(unique(as.character(sfshort.2$Date)), levels(sfshort.2$Date))
# [1] FALSE

The factor levels are unchanged (and will all appear in the plot).

Now, let's convert Date column to class POSIXlt,

sfshort.3 <- transform(sfshort.2, Date=strptime(Date, "%Y-%m-%d"))

str(sfshort.3)
# 'data.frame': 14 obs. of  2 variables:
# $ Date : POSIXlt, format: "2005-01-14" "2005-01-21" "2005-01-28" "2005-02-04" ...  ## <- CHANGED!
# $ Value: num  1185 1168 1171 1203 1205 ...

and plot all three versions:

op <- par(mfrow=c(2, 2))
plot(sfshort.1, main="sfshort.1", border=c(2, rep(1, 14)))
plot(sfshort.2, main="sfshort.2")
plot(sfshort.3, main="sfshort.3")
par(op)

enter image description here

We can see sfshort.2 still shows "2005-01-07" although it was deleted. In sfshort.3 the issue is eliminated and the data is correctly recognized as date.

Note, that in lm you need to use Date as a factor column again, by doing:

lm(Value ~ factor(Date), data=sfshort)
jay.sf
  • 60,139
  • 8
  • 53
  • 110