1

I have the following very simple R script which uses zoo to visualize daily numbers on a timeline:

# Small script to plot daily number of added users to database
library(zoo)

data <- read.csv("search_history.csv", header=FALSE)
# last line will be cut because it might be incomplete
zoodata <- data[1:(length(data$V2)-1), ]
series <- zoo(zoodata$V2, zoodata$V1)

par(mar=c(7, 6, 4, 2), 
    lab=c(5, 6, 5), 
    mgp = c(4, 1, 0))

plot(series,
     main="Number of users added to database over time", 
     xlab="Date", 
     ylab="Number of users",
     las=2,
     lwd=2,
     col="red",
     cex.axis=0.7)

Content of search_history.csv:

"2012-12-27","458","4728"
"2012-12-28","239","6766"
"2012-12-29","193","8189"
"2012-12-30","148","7698"
"2012-12-31","137","7370"
"2013-01-01","119","6324"
"2013-01-02","122","7016"
"2013-01-03","115","7986"
"2013-01-04","112","8222"
"2013-01-05","112","6828"
"2013-01-06","124","7318"
"2013-01-07","121","8228"
"2013-01-08","120","8158"
...

I want to visualize the first (V1) and the second column (V2). I basically have two problems: The first and obvious one are the dashed lines at y-Position ~50 and ~450. How can I remove them and why are they even included?

The second problem is the inclusion of 2013-01-26 in the x-Axis. As you can see, I removed the last line of the dataset which contains this data (like an amateur, maybe there is a better way to do this). So the plot should not include the last date. I don't understand why it even knows about this date since it takes zoodata as input, not data. my plot

GSee
  • 48,880
  • 13
  • 125
  • 145
grssnbchr
  • 2,877
  • 7
  • 37
  • 71
  • alright you were too fast an already posted an answer.. ;-) that did the trick, thank you. you can answer the question and I will accept it. would be glad for an explanation of stringsAsFactors though, did not really understand the manual explanation. and one last thing: with these settings, I now have only 4 x-label ticks (Dec 27 to Jan 21). how can I increase that number, maybe to 8? the parameter "lab" does not seem to work as expected, or I just don't really understand it. – grssnbchr Jan 26 '13 at 20:54
  • Do you mean `?factor` didn't make sense? Here's an argument against using stringsAsFactors=TRUE (http://stackoverflow.com/a/1368460/967840). – GSee Jan 26 '13 at 21:02
  • See this post http://stackoverflow.com/a/4355118/967840 for how to use different x-label ticks – GSee Jan 26 '13 at 21:04

2 Answers2

3

You can use read.zoo

## double quotes were removed but they could have been left in
series <- read.zoo(text = '
2012-12-27,458,4728
2012-12-28,239,6766
2012-12-29,193,8189
2012-12-30,148,7698
2012-12-31,137,7370
2013-01-01,119,6324
2013-01-02,122,7016
2013-01-03,115,7986
2013-01-04,112,8222
2013-01-05,112,6828
2013-01-06,124,7318
2013-01-07,121,8228
2013-01-08,120,8158', sep =',')

Then using your plot instruction ,

plot(series,
     main="Number of users added to database over time", 
     xlab="Date", 
     ylab="Number of users",
     las=2,
     lwd=2,
     col="red",
     cex.axis=0.7)

You can compare visually the 2 Times series... enter image description here

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • @agstudy, why'd you remove all the quote marks? It makes it look like you have to do extra manual work that you don't really have to do. – GSee Jan 26 '13 at 21:08
  • @GSee it looks so but no ...because I test it with `read.zoo(text=..` and not `read.zoo(file=..`. Your answer is good because he deals with his problem , mine just to tell him of the existence of `read.zoo`. I Tested it with quotes and it works also, but I think it is not good idea to have simple quotes and doubles quotes together... Thanks for the catch. – agstudy Jan 26 '13 at 21:12
  • 2
    @wnstnsmth, there's an entire [Reading Data in zoo](http://cran.r-project.org/web/packages/zoo/vignettes/zoo-read.pdf) vignette if you haven't seen it yet. – GSee Jan 26 '13 at 21:20
1

Two things: Your strings are being read as factors, and you are indexing your zoo object by a character vector instead of by Dates.

If you include stringsAsFactors=FALSE in your read.csv call and give your zoo object a Date index it will look more like you were expecting.

library(zoo)    
data <- read.csv(text='"2012-12-27","458","4728"
                 "2012-12-28","239","6766"
                 "2012-12-29","193","8189"
                 "2012-12-30","148","7698"
                 "2012-12-31","137","7370"
                 "2013-01-01","119","6324"
                 "2013-01-02","122","7016"
                 "2013-01-03","115","7986"
                 "2013-01-04","112","8222"
                 "2013-01-05","112","6828"
                 "2013-01-06","124","7318"
                 "2013-01-07","121","8228"
                 "2013-01-08","120","8158"', header=FALSE, 
                 stringsAsFactors=FALSE)

zoodata <- data[1:(length(data$V2)-1), ]
series <- zoo(zoodata$V2, as.Date(zoodata$V1))

par(mar=c(7, 6, 4, 2), 
    lab=c(5, 6, 5), 
    mgp = c(4, 1, 0))

plot(series,
     main="Number of users added to database over time", 
     xlab="Date", 
     ylab="Number of users",
     las=2,
     lwd=2,
     col="red",
     cex.axis=0.7)

Which produces:

enter image description here

GSee
  • 48,880
  • 13
  • 125
  • 145