strptime - set format with data in csv file

Question

Example Raw data:

1
Mon Apr 06 23:55:14 PDT 2009

2
Tue Apr 07 01:16:43 PDT 2009

3
Tue Apr 07 03:06:17 PDT 2009

I having a problem with doing the format with strptime since i having a data from a csv file.

dates <- read.csv(file = "Australia_timestamp.csv")
colnames(dates) <- c("Date")
format <- "%a %b %d %H:%M:%S %z %Y"
dates <-strptime(dates[], format = format)

Output:

NA

How can I solve this issue by not having NA and having a output with correct format?

What exactly are you struggling with? Also, if you are sure that your question has not been asked and answered before, please add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). Adding a MRE and an example of the desired output (in code form, not tables and pictures) makes it much easier for others to find and test an answer to your question. — dario, Oct 03 '21 at 15:42
Please consult the link in the comment above and make your data example reproducible. The problem you are having is most likely related to the structure of the data and/or date column. — dario, Oct 03 '21 at 15:57
Also, see `?strptime` for correct usage of the conversion specifications. — dario, Oct 03 '21 at 16:34
The question shows an input with a single column in which the first line contains a 1, the second describes a date and time, the third is blank and so on. Is that a correct description of what you have or is the 1, 2, 3, etc. on the same line as the date/time? — G. Grothendieck, Oct 03 '21 at 17:18

score 1 · Accepted Answer · answered Oct 03 '21 at 19:11

We may use parse_date from parsedate

library(tibble)
library(parsedate)
lines1 <- lines[nzchar(lines)]
tibble(num = lines1[c(TRUE, FALSE)], time = parse_date(lines1[c(FALSE, TRUE)]))

-output

# A tibble: 3 × 2
  num   time               
  <chr> <dttm>             
1 1     2009-04-06 23:55:14
2 2     2009-04-07 01:16:43
3 3     2009-04-07 03:06:17

data

lines <- readLines(textConnection("1
Mon Apr 06 23:55:14 PDT 2009

2
Tue Apr 07 01:16:43 PDT 2009

3
Tue Apr 07 03:06:17 PDT 2009"))
# or read from the file
lines <- readLines("file.txt")

score 0 · Answer 2 · answered Oct 03 '21 at 16:37

The NA you're getting is because %z does not capture PDT. It would fall under %Z as it is not numeric, but strptime does not support %Z for input as character abbreviations are not unambiguous. PDT is UTC-7, so to capture the timestamp using strptime, your strings would have to be, for example,

Tue Apr 07 03:06:17 -0700 2009

What you can do to solve the issue depends on the data you have. If all your timestamps are in PDT, you can simply replace PDT in all your strings by -0700:

dates$Date <- gsub("PDT", "-0700", dates$Date)
dates$Date <- strptime(dates$Date, format = format)

G. Grothendieck · Answer 3 · 2021-10-03T18:35:31.653

1) The subject refers to a csv file but the data shown is not in csv form. We will assume that the form shown in the question is correct and the input file contains a first line of just 1, a second line describing a date/time, a third which is blank and so on as in the Note at the end.

We read it in line by line, and then prefix successive lines with num: , datetime: and an empty string. That will convert it to debian control format form (dcf) so we can use read.dcf to read it giving a character matrix. Convert that to a data frame, use convert.type to convert the first column to numeric and convert the second column to POSIXct.

The first two lines in the pipeline below are written to make use of Lines in the Note for reproducibility but to read the same from a file just replace the first two lines in the pipeline with something like this, "myfile" |> where myfile is the name of the input file.

No packages are used.

prefix <- \(x) paste(rep(c("num:", "time:", ""), length = length(x)), x)

Lines |>
  textConnection(name = "") |>
  readLines() |>
  trimws() |>
  prefix() |> 
  textConnection(name = "") |>
  read.dcf() |>
  as.data.frame() |>
  type.convert(as.is = TRUE) |>
  transform(time = as.POSIXct(time, format = "%a %b %d %H:%M:%S PDT %Y"))

giving this data.frame:

  num                time
1   1 2009-04-06 23:55:14
2   2 2009-04-07 01:16:43
3   3 2009-04-07 03:06:17

2) Another way to do this is to convert the data to a matrix as shown and then proceed as above.

Lines |>
  textConnection(name = "") |>
  read.table(sep = "?", strip.white = TRUE) |>
  unlist() |>
  matrix(ncol = 2, byrow = TRUE, dimnames = list(NULL, c("num", "time"))) |>
  as.data.frame() |>
  type.convert(as.is = TRUE) |>
  transform(time = as.POSIXct(time, format = "%a %b %d %H:%M:%S PDT %Y"))

3) If the format shown in the question is an error and it was meant that the 1, 2, 3, ... appears before the datetime on the same line like this then read it in line by line, replace the first space with a comma and read that into a data frame with the indicated column names. Finally convert the time column to POSIXct.

Lines2 <- "1 Mon Apr 06 23:55:14 PDT 2009
2 Tue Apr 07 01:16:43 PDT 2009
3 Tue Apr 07 03:06:17 PDT 2009"

Lines2 |>
  textConnection(name = "") |>
  readLines() |>
  trimws() |>
  sub(pattern = " ", replacement = ",") |>
  textConnection(name = "") |>
  read.table(sep = ",", col.names = c("num", "time"), strip.white = TRUE) |>
  transform(time = as.POSIXct(time, format = "%a %b %d %H:%M:%S PDT %Y"))

4) If the input is truly a csv file but with no column headers as shown in Lines3 then it is even easier:

Lines3 <- "1,Mon Apr 06 23:55:14 PDT 2009
2,Tue Apr 07 01:16:43 PDT 2009
3,Tue Apr 07 03:06:17 PDT 2009"

Lines3 |>
  textConnection(name = "") |>
  read.table(sep = ",", col.names = c("num", "time"), strip.white = TRUE) |>
  transform(time = as.POSIXct(time, format = "%a %b %d %H:%M:%S PDT %Y"))

Note

Lines <- "1
Mon Apr 06 23:55:14 PDT 2009

2
Tue Apr 07 01:16:43 PDT 2009

3
Tue Apr 07 03:06:17 PDT 2009"

strptime - set format with data in csv file

3 Answers3

data

Note