0

Good Afternoon! I have data which consist of date and time of share price. I need to join this data to the one column.

        date  time   open   high    low  close
1 1999.04.08 11:00 1.0803 1.0817 1.0797 1.0809
2 1999.04.08 12:00 1.0808 1.0821 1.0806 1.0807
3 1999.04.08 13:00 1.0809 1.0814 1.0801 1.0813
4 1999.04.08 14:00 1.0819 1.0845 1.0815 1.0844
5 1999.04.08 15:00 1.0839 1.0857 1.0832 1.0844
6 1999.04.08 16:00 1.0842 1.0852 1.0824 1.0834

I tried to do that using this function:

df1 <- within(data, { timestamp = strptime(paste(date, time), "%Y/%m/%d%H:%M:%S") })

but I got the column of NAs.

Also I tried to do that using:

data$date_time = mdy_hm(paste(data$date, data$time))

but I got again the error:

Warning message:
    All formats failed to parse. No formats found.

Please, tell me what I do wrong.

alistaire
  • 42,459
  • 4
  • 77
  • 117
Ruslan Sayakhov
  • 169
  • 1
  • 2
  • 7
  • 1
    Match your tokens and separators: `data$timestamp <- as.POSIXct(paste(data$date, data$time), format = '%Y.%m.%d %H:%M')` or `lubridate::ymd_hm(paste(data$date, data$time))` – alistaire Nov 20 '17 at 19:13

1 Answers1

1

In your particular example, let's break it down first to see why you are getting NA values, and then generate a solution that creates your desired results.

> date <- c("1999.04.08", "1999.04.08")
> time <- c("11:00", "12:00")
> df <- data.frame(date, time, stringsAsFactors = F)
> df
        date  time
1 1999.04.08 11:00
2 1999.04.08 12:00
> str(df)
'data.frame':   2 obs. of  2 variables:
 $ date: chr  "1999.04.08" "1999.04.08"
 $ time: chr  "11:00" "12:00"

Don't forget to use str to understand the data type(s) you are dealing with. That can and will greatly influence the answer to your question. Looking at the help description of function strptime, we see the following definition:

strptime converts character vectors to class "POSIXlt": its input x is first converted by as.character. Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.

So, let's break down your code:

df1 <- within(data, 
        { timestamp = strptime(paste(date, time), 
                      "%Y/%m/%d%H:%M:%S") 
        })

First, the paste function:

> paste(date[1], time[1])
[1] "1999.04.08 11:00"

This generates a character vector with the format above.

Next, the strptime command.

> strptime(paste(date[1], time[1]), "%Y/%m/%d%H:%M:%S")
[1] NA

Okay, we see an NA. First, be sure to explicitly write format =, if it reads as tedious, then you should not be having any problems writing flawless code that you will remember forever. Looking at the help code we see:

x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- strptime(x, "%d%b%Y")
> z
[1] "1960-01-01 PST" "1960-01-02 PST" "1960-03-31 PST" "1960-07-30 PDT"

Notice the help section also defines upper/lower case Y, and the same with the month and date variables. In your case, you are trying to extract something of the following form: YYYY/mm/ddHH:MM:SS, such as 2017/20/1111:28:30. Do you see the issue now?

Using your string extraction attempt, we modify it slightly to get the format you are looking for:

> strptime(paste(date, time), format = "%Y.%m.%d %H:%M")
[1] "1999-04-08 11:00:00 PDT" "1999-04-08 12:00:00 PDT"

Putting it all together you get:

> df1 <- within(df, {timestamp = strptime(paste(date, time), format = "%Y.%m.%d %H:%M")})
> str(df1)
'data.frame':   2 obs. of  3 variables:
 $ date     : chr  "1999.04.08" "1999.04.08"
 $ time     : chr  "11:00" "12:00"
 $ timestamp: POSIXlt, format: "1999-04-08 11:00:00" "1999-04-08 12:00:00"
> df1
        date  time           timestamp
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00

Oh yeah, and try out the dplyr package.

library(dplyr)
> df %>% 
  mutate(ts = as.POSIXct(paste(date,time), 
         format = "%Y.%m.%d %H:%M"))
        date  time                  ts
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00
Heikki
  • 2,214
  • 19
  • 34
Kamil
  • 412
  • 4
  • 11
  • It was the great description. I understood my error. Thank you!! – Ruslan Sayakhov Nov 20 '17 at 20:45
  • Kamil, now I have the one question: I need the calculate difference between the first and second observation by ts variable: ts[2]-ts[1], but R prints that "The time difference of 1 hours. But how f.e. simplify this answer that R will print "1h" and no additional text? – Ruslan Sayakhov Nov 21 '17 at 18:05
  • There are a number of ways [1](https://stackoverflow.com/questions/30510044/how-to-make-time-difference-in-same-units-when-subtracting-posixct), [2](https://stackoverflow.com/questions/1962278/dealing-with-timestamps-in-r), [3](https://blog.exploratory.io/5-most-practically-useful-operations-when-working-with-date-and-time-in-r-9f9eb8a17465), and more. A simple hack, and this is something you should replace with something more elegant in the future, is to wrap `as.numeric` around `difftime`. Try out `as.numeric(difftime(Sys.Date(), Sys.Date()+1, units = "hours"))` – Kamil Nov 21 '17 at 23:51