0

I am trying to store a txt file that has two columns date and time respectively. Something like this: 1999-01-01 12:08:56

Now I want to perform some Date operations using PIG, but i want to store date and time like this 1999-01-01T12:08:56 ( I checked this link): http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

What I want to know is that what kind of format can I use in which my date and time are in one column, so that I can feed it to PIG, and then how to load that date into pig. I know we change it into datetime, but its showing errors. Can somebody kindly tell me how to load Date&Time data together. An example would be of great help.

CodeReaper
  • 377
  • 4
  • 18
  • I do not know anything about Apache PIG. But if your question is simply how to take a string representing a date and a string representing a time-of-day, combine the two into a string to be parsed as a date-time value (object), possibly adjust that value to another time zone such as UTC, and then serialize that value to a string in a different format to represent the combined date-time value… well, that has been covered in at least a thousand questions and answers on StackOverflow. And I just gave you the key words you need to search for, along with 'joda' and 'java.time'. – Basil Bourque Sep 23 '14 at 18:38
  • possible duplicate of [Parse Date String to Some Java Object](http://stackoverflow.com/questions/8854780/parse-date-string-to-some-java-object) – Basil Bourque Sep 23 '14 at 18:41

1 Answers1

2

Please let me know if this works for you.

input.txt  
1999-01-01 12:08:56  
1999-01-02 12:08:57  
1999-01-03 12:08:58  
1999-01-04 12:08:59  

PigScript:  
A = LOAD 'input.txt' using PigStorage(' ') as(date:chararray,time:chararray);  
B = FOREACH A GENERATE CONCAT(date,'T',time) as myDateString;  
C = FOREACH B GENERATE ToDate(myDateString);  
dump C;  

Output:  
(1999-01-01T12:08:56.000+05:30)  
(1999-01-02T12:08:57.000+05:30)  
(1999-01-03T12:08:58.000+05:30)  
(1999-01-04T12:08:59.000+05:30)  

Now the myDateString is in date object, you can process this data using all the build in date functions.

Incase if you want to store the output as in this format 
(1999-01-01T12:08:56)  
(1999-01-02T12:08:57)  
(1999-01-03T12:08:58)  
(1999-01-04T12:08:59)

you can use REGEX_EXTRACT to parse the each data till "."  something like this  

D = FOREACH C GENERATE ToString($0) as temp;
E = FOREACH D GENERATE REGEX_EXTRACT(temp, '(.*)\\.(.*)', 1);
dump E;

Output:
(1999-01-01T12:08:56)  
(1999-01-02T12:08:57)  
(1999-01-03T12:08:58)  
(1999-01-04T12:08:59)  
Sivasakthi Jayaraman
  • 4,724
  • 3
  • 17
  • 27
  • Hi....when I use the ToDate function I get the output this: (1999-01-01T12:08:56.000-08:00) (2011-03-19T19:07:43.000-07:00) (2008-09-25T21:08:31.000-07:00) (2014-11-30T11:11:21.000-08:00) (1978-12-13T20:32:31.000-08:00) (2010-11-21T17:33:34.000-08:00) (2010-10-24T22:34:43.000-07:00) (2007-09-27T18:21:44.000-07:00) Can you tell me why am I not getting +05:30 in my output ? How can i change the timezone? Also one more thing, when I have the data after using REGEX like this: (1999-01-01T12:08:56) (1999-01-02T12:08:57) Can i still use all my Date time functions on this extracted data? – CodeReaper Sep 24 '14 at 03:58
  • 1
    Hi....ToString($0) is not working . It is throwing this error: grunt> D = FOREACH C GENERATE ToString($0) as temp; 2014-09-23 21:09:24,305 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.ToString as multiple or none of them fit. Please use an explicit cast. – CodeReaper Sep 24 '14 at 04:30