0

I swear I'm using the correct date format but I keep getting a parse error when loading into WEKA.

"MonFeb2116:00:00+0000"
"EEEMMMddHH:mm:ssZ"

Here is an example dataset:

@RELATION example

@ATTRIBUTE tweetid STRING 
@ATTRIBUTE timestamp DATE "EEEMMMddhh:mm:ssZ"
@ATTRIBUTE I NUMERIC
@ATTRIBUTE a NUMERIC
@ATTRIBUTE cool NUMERIC
@ATTRIBUTE foo NUMERIC
@ATTRIBUTE bar NUMERIC
@ATTRIBUTE temp NUMERIC
@ATTRIBUTE class {POS,NEG}

@DATA
39715973388828673,"MonFeb2116:00:00+0000",0,0,0,0,2,2,?
39716148329197568,"MonFeb2116:00:42+0000",0,1,0,0,0,1,?
39715973388828673,"MonFeb2116:00:51+0000",1,0,0,0,0,0,?
39723030380941312,"MonFeb2116:28:03+0000",0,0,0,0,0,0,?
39723030531944448,"MonFeb2116:28:03+0000",0,0,0,0,0,0,?
39723031433707520,"MonFeb2116:28:03+0000",0,0,0,0,0,0,?

WEKA Error:

unparseable date "MonFeb2116:00:00+0000, read Token[MonFeb2116:00:00+0000], line 21

Have used the API documentation to double check - missing something?

http://download.oracle.com/javase/1.4.2/docs/api/java/text/SimpleDateFormat.html

EDIT -----------

@RELATION example

@ATTRIBUTE tweetid STRING 
@ATTRIBUTE timestamp DATE "EEE MMM dd hh:mm:ss Z"
@ATTRIBUTE I NUMERIC
@ATTRIBUTE a NUMERIC
@ATTRIBUTE cool NUMERIC
@ATTRIBUTE foo NUMERIC
@ATTRIBUTE love NUMERIC
@ATTRIBUTE temp NUMERIC
@ATTRIBUTE class {POS,NEG}

@DATA
39715973388828673,"Mon Feb 21 16:00:00 +0000",0,0,0,0,2,2,?
39716148329197568,"Mon Feb 21 16:00:42 +0000",0,1,0,0,0,1,?
39715973388828673,"Mon Feb 21 16:00:51 +0000",1,0,0,0,0,0,?
39723030380941312,"Mon Feb 21 16:28:03 +0000",0,0,0,0,0,0,?
39723030531944448,"Mon Feb 21 16:28:03 +0000",0,0,0,0,0,0,?
39723031433707520,"Mon Feb 21 16:28:03 +0000",0,0,0,0,0,0,?

Formatted date to separate tokens with space. Still not playing ball in WEKA...

bhalsall
  • 35
  • 1
  • 6

2 Answers2

1

Well, I don't know whether it'll sort everything out or not, but try changing hh (12-hour format) to HH (24-hour format). I'm not sure whether it'll be able to read a "day of the week / month name" without any spaces even so... do you have to get the value in that format? If you could put a space after the 3rd and 6th characters it would help...

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • that is correct. Changed it but still doesn't parse. Is the mixture of presentations (text, number) likely to cause an issue? The timestamp format is taken from the twitter API, with any spacing removed. I will try adding spacing between the day (EEE) and month (MMM) to see if it makes a difference – bhalsall Apr 18 '11 at 14:14
1

Which default locale are you using? Using an English locale, the String "MonFeb2116:00:00+0000" should be parseable with the pattern "EEEMMMddHH:mm:ssZ". Note however, that the year will default to 1970, if not present in the pattern or parsed string. That is probably not what you really want.

jarnbjo
  • 33,923
  • 7
  • 70
  • 94
  • I've amended timestamp to include the year again and put in some spaces between each token: "EEE MMM dd HH:mm:ss Z yyyy" now fully parses timestamps such as: Mon Feb 21 16:00:00 +0000 2011 Thanks for your help! – bhalsall Apr 18 '11 at 14:57