0

I have a data (df) in this format. I need to covert the Time stamp (tweetCreatedAt) into a date object so that I can further manipulate the data.

    tweetCreatedAt                comment_text
1   2014-05-17T00:00:49.000Z      @truthout: India Elects Hard-Right Hindu 

2   2014-05-17T00:00:49.000Z     Narendra Modi is welcome to visit US !

Any help?

I have tried the following

df[,1] <- lapply(df[,1],function(x) as.POSIXct(x, '%Y-%m-%dT%H:%M:%S'))

But now I'm getting the dates only and not the actual time.

  • Try `as.POSIXct(yourdata[,1], format='%Y-%m-%dT%H:%M:%S')` or if you need a 'Date' class, just do `as.Date(yourdata[,1], '%Y-%m-%d')` – akrun Sep 04 '15 at 06:16
  • 2
    `Z` stands for UTC time - http://stackoverflow.com/questions/9706688/what-does-the-z-mean-in-unix-timestamp-120314170138z - so make sure you specify this: `as.POSIXct(data$tweetCreatedAt, format='%Y-%m-%dT%H:%M:%S', tz="UTC")` at the end – thelatemail Sep 04 '15 at 06:19
  • I'm getting the following error. 'Error in as.POSIXct.default(df$tweetCreatedAt, format = "%Y-%m-%dT%H:%M:%S", : do not know how to convert 'df$tweetCreatedAt' to class “POSIXct”' – Akshay Jangra Sep 04 '15 at 06:23
  • I couldn't reproduce the error. – akrun Sep 04 '15 at 06:25
  • I simply have no idea about this. The error is "do not know how to convert 'df$tweetCreatedAt' to class “POSIXct”" – Akshay Jangra Sep 04 '15 at 06:30
  • Can you try after specifying the `origin` i.e. `as.POSIXct(df1[,1], format='%Y-%m-%dT%H:%M:%S', origin='1970-01-01',tz='UTC')` – akrun Sep 04 '15 at 06:33
  • 1
    According to the format and to `?strptime` a shorter format is `%FT%T`. A `dput(head(df))` would help giving the class of the tweetCreatedAt column too. – Tensibai Sep 04 '15 at 07:33
  • Not helping either :/ – Akshay Jangra Sep 04 '15 at 08:48
  • Did you get that error by applying @akrun 's solution to your initial dataset, or to the example you provided? I have a feeling that some elements of your "tweetCreatedAt" column are lists. – AntoniosK Sep 04 '15 at 10:09
  • Nope. No luck. They might be list. Can we work them out? – Akshay Jangra Sep 04 '15 at 11:29
  • @AkshayJangra That's why I added in my comment: "A dput(head(df)) would help giving the class of the tweetCreatedAt column too" ... – Tensibai Sep 04 '15 at 14:29

1 Answers1

0

Not sure if this is the problem, but it's a possible one. As I've mentioned in my comment, the elements of a column could be values, or lists, due to the process that generated this dataset. Check this example:

# simplified example
dt = read.table(text = "tweetCreatedAt  comment_text
1   2014-05-17T00:00:49.000Z      @truthout 
2   2014-05-19T00:00:49.000Z     Narendra", header=T)

dt$tweetCreatedAt = as.character(dt$tweetCreatedAt)

# data set looks like
dt

#             tweetCreatedAt comment_text
# 1 2014-05-17T00:00:49.000Z    @truthout
# 2 2014-05-19T00:00:49.000Z     Narendra

as.POSIXct(dt$tweetCreatedAt, format='%Y-%m-%dT%H:%M:%S')

# [1] "2014-05-17 00:00:49 BST" "2014-05-19 00:00:49 BST"


# let's manually change this element to a list
dt$tweetCreatedAt[2] = list(c("2014-05-19T00:00:49.000Z","2014-05-20T00:00:49.000Z"))

# data set now looks like this
dt

#                                       tweetCreatedAt comment_text
# 1                           2014-05-17T00:00:49.000Z    @truthout
# 2 2014-05-19T00:00:49.000Z, 2014-05-20T00:00:49.000Z     Narendra

as.POSIXct(dt$tweetCreatedAt, format='%Y-%m-%dT%H:%M:%S')

# Error in as.POSIXct.default(dt$tweetCreatedAt, format = "%Y-%m-%dT%H:%M:%S") : 
#   do not know how to convert 'dt$tweetCreatedAt' to class “POSIXct”
AntoniosK
  • 15,991
  • 2
  • 19
  • 32