0

I have two problems handling my time variable in Gnu R!

Firstly, I cannot recode the time data (downloadable here) from factor (or character) with as.Posixlt or with as.Date without an error message like this:

character string is not in a standard unambiguous format

I have then tried to covert my time data with:

dates <- strptime(time, "%Y-%m-%j")

which only gives me:

NA

Secondly, the reason why I wanted (had) to convert my time data is that I want to plot it with ggplot2 and adjust my scale_x_continuous (as described here) so that it only writes me every 50 year (i.e. 1250-01-01, 1300-01-01, etc.) in the x-axis, otherwise the x-axis is too busy (see graph below).

enter image description here

This is the code I use:

library(ggplot2)
library(scales)
library(reshape)
df <- read.csv(file="https://dl.dropboxusercontent.com/u/109495328/time.csv")
attach(df)
dates <- as.character(time)
population <- factor(Number_Humans)
ggplot(df, aes(x = dates, y = population)) + geom_line(aes(group=1), colour="#000099") + theme(axis.text.x=element_text(angle=90)) + xlab("Time in Years (A.D.)")
Til Hund
  • 1,543
  • 5
  • 21
  • 37
  • 1
    What kind of date is "1250-02-46"? It's rare for february to have more than 29 days. Also, don't use `attach`. – Roland Nov 14 '14 at 14:11
  • Which would you not recommend to use `attach`? 46 is this case is the 46th day of the year. This is not something I did but the program I use. Is this really a problem for R? – Til Hund Nov 14 '14 at 14:13
  • You mean, if you pass R something and you tell it that "46" is "Day of the month as decimal number (01–31)", and it's not actually the day of the month, will R get confused? Shockingly, yes. You want %j, which is "Day of year as decimal number (001–366)." – Oliver Keyes Nov 14 '14 at 14:18
  • Huh; you're using %j. If I run it, https://gist.github.com/Ironholds/12097162d25dc935960e is the output. That's not happening for you? – Oliver Keyes Nov 14 '14 at 14:20
  • Your timestamps appear to have three levels of quoting. Are those quotes making it into the values? – Oliver Keyes Nov 14 '14 at 14:22
  • Yes, I think this is the issue. – Til Hund Nov 14 '14 at 22:22

1 Answers1

2

You need to remove the quotation marks in the date column, then you can convert it to date format:

df <- read.csv(file="https://dl.dropboxusercontent.com/u/109495328/time.csv")
df$time <- gsub('\"', "", as.character(df$time), fixed=TRUE)
df$time <- as.Date(df$time, "%Y-%m-%j")


ggplot(df, aes(x = time, y = Number_Humans)) + 
     geom_line(colour="#000099") + 
     theme(axis.text.x=element_text(angle=90)) + 
     xlab("Time in Years (A.D.)") 

enter image description here

erc
  • 10,113
  • 11
  • 57
  • 88
  • 1
    Also, `aes(group=1)` is superfluous. – Roland Nov 14 '14 at 14:34
  • Hm, when I remove it I get: `geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?`; I wonder if there's something wrong, don't know where that vertical line comes from either.. – erc Nov 14 '14 at 14:35
  • You should not turn all columns of the data.frame to characters. Use `df$time <- gsub('\"', "", as.character(df$time), fixed=TRUE)`. – Roland Nov 14 '14 at 14:36
  • If you use `as.POSIXct` or `strptime` you should always specify the time zone explictly. Here it would be sufficient to use `as.Date` which has a `format` argument. E.g., `df$time <- as.Date(df$time, format="%Y-%m-%j")`. – Roland Nov 14 '14 at 14:44
  • @Roland, why would you not recommend to use `attach`? Another question: when I convert `Number_Humans` into a factor, I need `aes(group=1)`. Why? – Til Hund Nov 14 '14 at 22:25
  • 1
    @TilHund the group=1 thing i explained here: http://stackoverflow.com/questions/10357768/plotting-lines-and-the-group-aesthetic-in-ggplot2 – erc Nov 15 '14 at 10:15