0

I have a data frame with two columns. The first is a numerical value, the other is a string describing a time. The time format looks like yyyy-mm-dd--hh-mm-ss-?????? (e.g. 2015-03-04--12-11-35-669696), I don't know what the last 6 digits mean. E.g.

       y                        time
1  4.548 2014-08-11--09-07-44-202586
2  4.548 2014-08-11--09-07-54-442586
3  4.548 2014-08-11--09-08-04-522586
4  4.478 2014-08-11--09-08-14-762586
5  4.431 2014-08-11--09-08-24-522586
6  4.446 2014-08-11--09-08-34-922586
7  4.492 2014-08-11--09-08-44-522586
8  4.508 2014-08-11--09-08-54-442586
9  4.486 2014-08-11--09-09-04-202586
10 4.497 2014-08-11--09-09-14-442586
11 4.461 2014-08-11--09-09-24-202586

I want to plot them with

ggplot(df, aes(x=time, y=y)) + geom_line()

But I have the problem, that ggplot doesn't know how to deal with data of class character and in particular with my given time format. I tried to use AsciiToInt from the pakage {sfsmisc} to convert the strings to numerical values, but it repeats a list of integers for each string (one number for each character, of course). I can also sort my time strings with mixedsort from the pakage {gtools}, but I don't how to apply it for the plot (also keeping in mind the distance).

Another problem is that I don't want every time string appear as tick at the x-axis, due to I have around 20k rows. Maybe I can solve that problem like in this question, but I cannot check that as long as the first problem occurs.

Can you help me, ploting such data with the time as a numeric-like value on the x-axis?

Community
  • 1
  • 1
Jojo
  • 357
  • 2
  • 10

1 Answers1

0

I loaded your data as a .txt file called time dat. First I convert your data into POSIXct type. To make a cleaner graph for test purposes I omit the seconds field, if you want to add them in just use the commented out line.

library(ggplot2)
timedat<-read.csv("~/Work/Timedat.csv")
timedat
str(timedat)
> str(timedat)
'data.frame':   11 obs. of  3 variables:
 $ X   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ y   : num  4.55 4.55 4.55 4.48 4.43 ...
 $ time: Factor w/ 11 levels "2014-08-11--09-07-44-202586",..: 1 2 3 4 5 6 7 8 9 10 ...

#timedat$time<-as.POSIXct(as.character(timedat$time),format = "%Y-%m-%d--%H-%M-%S")

timedat$time<-as.POSIXct(as.character(timedat$time),format = "%Y-%m-%d--%H-%M")

qplot(data=timedat,y=y,x=time)+theme_bw()

> timedat
    X     y                        time
1   1 4.548 2014-08-11--09-07-44-202586
2   2 4.548 2014-08-11--09-07-54-442586
3   3 4.548 2014-08-11--09-08-04-522586
4   4 4.478 2014-08-11--09-08-14-762586
5   5 4.431 2014-08-11--09-08-24-522586
6   6 4.446 2014-08-11--09-08-34-922586
7   7 4.492 2014-08-11--09-08-44-522586
8   8 4.508 2014-08-11--09-08-54-442586
9   9 4.486 2014-08-11--09-09-04-202586
10 10 4.497 2014-08-11--09-09-14-442586
11 11 4.461 2014-08-11--09-09-24-202586

This produces the following plot with the dates nicely ordered. enter image description here

bjoseph
  • 2,116
  • 17
  • 24
  • 1
    When I also wanted to add the seconds, I used `format="%Y-%m-%d--%H-%M-%S`, but then the ticks only show the minutes (45,00,15,30,45,...). So I used `scale_x_datetime(breaks = date_breaks("30 sec"), labels=date_format("%Y-%m-%d %H:%M:%S"))` to fix that. – Jojo Mar 05 '15 at 09:38