0

I need help making a really simple plot. It is merely a line graph with an accompanying line for a different set of prices (they are both time series, a line for each good. X = prices, Y = time). So I have a data set that follows the format:

#Date    prices1   prices2

The dates all follow the format YYYY-MM-DD, and the two price columns are numbers. I have checked the class of all three columns to ensure that they are what they are supposed to be ("Date" , "numeric" and "numeric" respectively). Also a few things I feel I should mention:

  • The data was retrieved by a Quandl() call, and the lengths of the initial data frames were different. Thus, I had to join them using the full_join. I still checked the class() for each column in the final data frame and they are correct.

  • The price1 column has a length of 91, while price2 column has a length of 100. I initially thought this was the source of the problem. But after having set df$price2[92:100] = NA , I still have the same problem (I can plot each of the lines separately, but neither show up when I use the lines() function).

  • Furthermore, I made a separate script where I made a three column data frame where I have 100 columns and NA's for the first ten values of col1, NA's for 11th to 20th values of col2, etc.

Now, I did not make them a time-series object and tried graphic them simply as normal data frames. I can plot both of them on their own , but I cannot for the life of me plot one and use the lines() function for the other. What could I be missing? If NA's are the issue, then why am I unable to do the two-line plot with the Quandl data while my test data came out fine?

Due to the circumstances of the problem, I've decided to share the Quandl script and the test script.

#Original Script with issues
#Retrieving Data1
library(dplyr)
library(zoo)
library("Quandl")

data.1 = Quandl("JODI/OIL_TCPRKL_VEN")
#Putting data in chronological order

      #not in order
      print(data.1$Date[1])
      print(data.1$Date[length(data.1$Date)])

data.1 = data.frame(
  data.1$Date[length(data.1$Date):1],
  data.1$Value[length(data.1$Value):1]
)
names(data.1) = c("Date", "Value1")

      #Now in order
      print(data.1$Date[1])
      print(data.1$Date[length(data.1$Date)])





#Retrieving data2
data.2 = Quandl("JODI/OIL_TCPRKB_IRQ")

      #not in order
      print(data.2$Date[1])
      print(data.2$Date[length(data.2$Date)])

data.2 = data.frame(
  data.2$Date[length(data.2$Date):1],
  data.2$Value[length(data.2$Value):1]
)
names(data.2) = c("Date", "Value2")

      #now in order 
      print(data.2$Date[1])
      print(data.2$Date[length(data.2$Date)])


#join the data
data.join = data.frame(full_join(data.1, data.2))


plot(data.join$Date, data.join$Value1,
     col = "blue",
     main = "Should have both lines",
     type = "l",
     sub = "only one of them shows up though. Why?",
     xlab = "Date",
     ylab = "Values")
lines(data.join$Value2)
#plot only has one line. Why??

Here is also a test script I made where I do not seem to be having the issue.

library(dplyr)
library(zoo)


time.a = as.Date(c(10:30))
time.b = as.Date(c(20:40))
time.c = as.Date(c(30:50))

value.a = as.numeric(seq(10,30,1))
value.b = as.numeric(seq(20,60,2))
value.c = as.numeric(seq(20,30,.5))

    length(time.a)
    length(time.b)
    length(time.c)

    length(value.a)
    length(value.b)
    length(value.c)


    print(time.a)
    print(time.b)
    print(time.c)

    print(value.a)
    print(value.b)
    print(value.c)



data.a = data.frame(time.a, value.a)
data.b = data.frame(time.b, value.b)
data.c = data.frame(time.c, value.c)

names(data.a) = c("Date", "Value.a")
names(data.b) = c("Date", "Value.b")
names(data.c) = c("Date", "Value.c")

all.data = full_join(data.a, data.b)
all.data = full_join(all.data, data.c)


plot(all.data$Date, all.data$Value.a,
     type = "l",
     main = "plot",
     xlab = "Date",
     ylab = "Values")

lines(all.data$Date, all.data$Value.b,
      col = "blue")


lines(all.data$Date, all.data$Value.c,
      col = "red")

I am really trying to understand why the first script doesn't work, while my second one does. Any help or hints would be greatly appreciated. Why doesn't it work?

aosmith
  • 34,856
  • 9
  • 84
  • 118
im2wddrf
  • 551
  • 2
  • 5
  • 19
  • Are they in the same range? Your first `plot` call will define the range of the y-axis. By default, it will fit it to your y values. If your first series, say, has values between 1 and 10 and your second series has values between 50 and 70, it will be out of range and won't show up. You can specify `ylim` in the `plot()` call to override the default. – Gregor Thomas Jul 03 '17 at 17:27
  • @Gregor I have considered this as well. I used the range function and this is what I got respectively. `range(na.omit(data.join$Value1)) [1] 11275.4 15071.7 range(data.join$Value2) [1] 62440 151621` I also checked graphic while setting the range. Still, the lines do not show up. – im2wddrf Jul 04 '17 at 07:51
  • What did you set the range to, since those ranges are far from overlapping? It seems like that may indeed be the problem. The general way is to set `ylim = c(min(c(data.join$Value1, data.join$Value2), na.rm = T), max(c(data.join$Value1, data.join$Value2), na.rm = T))` – Gregor Thomas Jul 04 '17 at 15:56
  • If you need more help, I'd encourage you to make a **minimal** example. I, and many other people, are reluctant to go to all the trouble to install Quandl, download data, and follow your data manipulation steps, for a simple plotting issue. Instead of sharing 50 lines of code to create and process data, you should just share the final data in a copy/pasteable format: `dput(droplevels(head(join.data[c("Date", "Value1", "Value2")])))`. [See here for more tips](https://stackoverflow.com/q/5963269/903061) – Gregor Thomas Jul 04 '17 at 16:06
  • @Gregor I understand. I used your formula above. Here is the output I got. I am not quite sure what I am looking at. It shows some values but not the whole data frame. : `structure(list(Date = structure(c(14275, 14303, 14334, 14364, 14395, 14425), class = "Date"), Value1 = c(14347.197, 12856.3706, 14623.1995, 13998.8553, 14381.6974, 13688.8295), Value2 = c(68603, 62440, 73439, 69930, 72850, 73500)), .Names = c("Date", "Value1", "Value2"), row.names = c(NA, 6L), class = "data.frame")` – im2wddrf Jul 04 '17 at 20:54
  • Right, because we want a **minimal** example, not the whole data frame. `head()` returns the first 6 rows. – Gregor Thomas Jul 05 '17 at 14:44
  • @Gregor Thanks! I finally got it to work. I think the problem I had was with the ylim. I initially set it manually with `ylim = c(11000 , 17000)`, but it didn't work. I think setting the range with using the `range()` function helped. I also did not consider using the `na.rm()` . It finally works. Thank you so much. – im2wddrf Jul 05 '17 at 19:54
  • Right, a range of 11k to 17k doesn't come close to to 60k to 80k where the `Value2` series is. – Gregor Thomas Jul 05 '17 at 20:51

1 Answers1

1

Your problem is the y ranges.

Using your sample data:

df = structure(list(Date = structure(c(14275, 14303, 14334, 14364, 14395,
 14425), class = "Date"), Value1 = c(14347.197, 12856.3706, 14623.1995,
 13998.8553, 14381.6974, 13688.8295), Value2 = c(68603, 62440, 73439,
 69930, 72850, 73500)), .Names = c("Date", "Value1", "Value2"), row.names = 
 c(NA, 6L), class = "data.frame")

We can see that that the ranges are nowhere close to overlapping, so you need to pre-define the limits of the plot:

df_range = range(c(df$Value1, df$Value2), na.rm = T)
plot(df$Date, df$Value1, type = "l", ylim = df_range))
lines(df$Date, df$Value2, col = "firebrick4")

enter image description here

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294