0

I want to have a label corresponding to the year when most of the events occurred in the x axis of this plot in R plot():

It corresponds to stock market drops,

with one of the main clusters around 2008. However, the software decides on labeling the tick mark for 2010 instead.

There are posts explaining how to select some given labels; yet, how can I increase the density of labeled tick marks on the xaxis to get an approximate year for the spikes if I don't know these years ahead of the plotting? [CLARIFICATION: I don't want to have to interrogate the data to find out that the cluster is in 2008 - I just want to increase the number of labeled tick marks so as to have one of them fall close to the spike.]

Here is the code ready to copy and paste:

require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")

DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date          # Assigning Date to row names
DJI.raw <- DJI
DJI$Date <- NULL                   # Removing the Date column
chartSeries(DJI, type="auto", theme=chartTheme('white'))



# Function to calculate % change in closing price between days:
D2D = function (x) {              
  days = nrow(x)
  delta = numeric(days)
  for(i in 2:days){
    delta[i] <- (100*((x[i,1] - x[i - 1,1])/(x[i - 1,1])))
  }
  delta
}

z <- as.data.frame(DJI$Adj.Close)    # Subsetting closing price
DJI$InterDay <- D2D(z)               # Included as add'l column to VTI.
DJI.raw$InterDay <- DJI$InterDay

plot(DJI.raw$Date, DJI.raw$InterDay < -4, pch=19, col=2, type='h', 
     xlab="Year", ylab="Days with > 4% change", 
     cex.axis=.7, cex.main=.8, cex.lab =.8,las=2, 
     main  = "Clustering of big drop days")

FOLLOW-UP QUESTION:

If instead of formatting the data as above, I tried to consolidate it as an xts object as follows:

require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")
DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date          # Assigning Date to row names
DJI$Date <- NULL                   # Removing the Date column
DJI <- as.xts(DJI)
chartSeries(DJI, type="auto", theme=chartTheme('white'))

time(DJI)[DJI$Close == min(DJI$Close)]


# Function to calculate % change in closing price between days:
D2D = function (x) {              
  days = nrow(x)
  delta = numeric(days)
  for(i in 2:days){
    delta[i] <- (100*((x[i,1] - x[i - 1,1])/(x[i - 1,1])))
  }
  delta
}

z <- as.data.frame(DJI$Adj.Close)    # Subsetting closing price
DJI$InterDay <- D2D(z)               # Included as add'l column to VTI.
DJI.raw$InterDay <- DJI$InterDay


plot(time(DJI), DJI$InterDay < -4, col=2, type='h', 
     xlab="Year", ylab="Days with > 4% change",
     cex.axis=.7, cex.main=.8, cex.lab =.8,las=2, 
     main  = "Clustering of big drop days")

How could I achieve a more informative x axis?

Antoni Parellada
  • 4,253
  • 6
  • 49
  • 114

2 Answers2

1

You can use plot(..., xaxt="n") so that x-axe is not printed and then use axis(1, at=2000:2016) to add all your year labels.
By the way, I do not understand why you say that you do not know the years ahead of plotting because this is your data ! You can always calculate the number of data in a range using cut or table... You can also simply round the min and max of your x-data to be used as min and max ticks in axis
With dates, you should use something like:

plot(DJI.raw$Date, DJI.raw$InterDay < -4, pch=19, col=2, type='h', 
     xlab="Year", ylab="Days with > 4% change", xaxt="n",
     cex.axis=.7, cex.main=.8, cex.lab =.8,las=2, 
     main  = "Clustering of big drop days")

axis(1, at=as.Date(paste0(1985:2016, "-01-01")), labels = 1985:2016)

And if you want to add ticks at the positions of your data with years, you can try this. This is not really clean as you can have years multiple times but that's a start:

plot(DJI.raw$Date, DJI.raw$InterDay < -4, pch=19, col=2, type='h', 
     xlab="Year", ylab="Days with > 4% change", xaxt="n",
     cex.axis=.7, cex.main=.8, cex.lab =.8,las=2, 
     main  = "Clustering of big drop days")

axis(1, at = DJI.raw$Date[which(DJI.raw$InterDay < -4)], 
  labels = format(DJI.raw$Date[which(DJI.raw$InterDay < -4)], "%Y"))
Sébastien Rochette
  • 6,536
  • 2
  • 22
  • 43
  • Thank you for your answer. I actually managed to solve the immediate problem, but with less elegant code than your `axis(1, at=2000:20016)`; unfortunately, when I try now to get the same result with your `axis()` command, nothing prints. There is no error, but nothing prints. Also when I said that I didn't know the years ahead of plotting, I meant, the correspondence between the areas with higher density of vertical bars and the corresponding year (2008). – Antoni Parellada May 28 '17 at 12:43
  • It is not printed because your x-axis is in Date format. I edited my answer. – Sébastien Rochette May 28 '17 at 14:05
  • I added a complementary proposition. `paste0` is to create a date-like string with year, month and day, so that it is easily converted with `as.Date` – Sébastien Rochette May 28 '17 at 14:12
  • I am not used to xts. Try some things by yourself. If you are stuck at some point, you can post a new question on stackoverflow, with a link to this post if needed. Normally, stackoverflow is not to answer a "how do I do that?" question... – Sébastien Rochette May 28 '17 at 18:59
0

Evidently, the OP has an intrinsic problem in that it initially handles an extensive time series (xts) as an ad hoc is.data.frame(DJI) [1] TRUE with as.Date()-formatted row names, allowing coercion by functions such as chartSeries, which otherwise would yield a "chartSeries requires an xtsible object"; and at the same time, keeps the Date time column within a parallel data frame (DJI.raw) used for sub-setting.

Unfortunately, these conceptual problems at the root of inefficient coding are brushed over in the rest of the page, and "how-to" recipes take precedence over true didactic guidance. Further the post should have been marked as duplicate, because there is a worthy answer perfectly addressing the issue here. Since I can't delete the OP at this point, I'll post the application of the beautiful answer by @A5C1D2H2I1M1N2O1R2T1 to the current issue.

Using proper formatting to handle time-indexed data, here is a solution that avoids the unnecessary duplicate data frames in the OP:


Importing and formating the dataset as xts:

require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")
DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date          # Assigning Date to row names
DJI$Date <- NULL                   # Removing the Date column
DJI <- as.xts(DJI)

The additional column with the function to generate differences between closing dates as an additional column of DJI:

# Function to calculate % change in closing price between days:
D2D = function (x) {              
  days = nrow(x)
  delta = numeric(days)
  for(i in 2:days){
    delta[i] <- (100*((x[i,1] - x[i - 1,1])/(x[i - 1,1])))
  }
  delta
}

z <- as.data.frame(DJI$Adj.Close)    # Subsetting closing price
DJI$InterDay <- D2D(z)               # Included as add'l column to VTI.
#... we need something to fill in the 0 in row 1. Why not the second value?
DJI$InterDay[1]<-DJI$InterDay[2]

And finally the plot... First without the x axis:

plot(time(DJI), DJI$InterDay < -4, col=2, type='h', xaxt="n",
     xlab="Year", ylab="Days with > -4% change",
     cex.axis=.7, cex.main=.8, cex.lab =.8,las=2, 
     main  = "Clustering of big drop days")

enter image description here

... and adding the x axis:

# Selecting the time information in the names of the rows of DJI:
tt = time(DJI) 
# We select points spaced by approximately 1 business year:
# 365 days - 2 days off each weekend - 9 Holidays in the USA
ix = seq(1, nrow(DJI), by = 365 - 2*4*12 - 9) 
# Formatting the labels as just simply the year with two digits:
fmt = "%y"
# Generating vector of potential labels:
labs = format(tt, fmt) 
# Plotting the x axis:
axis(side = 1, at = tt[ix], labels = labs[ix],
     cex.axis = 0.7, las = 2)

enter image description here

Antoni Parellada
  • 4,253
  • 6
  • 49
  • 114