0

I am scraping a webpage that contains dates in this format: "8th November 2013". After I have returned the dates they are organized into an unordered array of strings. What I want to do then is somehow convert these strings to a simple date format like yyyy-MM-dd so I can order them and use them for interacting with the calendar?

k-prog
  • 173
  • 2
  • 13
  • Does this thread get you started: http://stackoverflow.com/questions/4011075/how-do-you-format-the-day-of-the-month-to-say-11th-21st-or-23rd-in-java – EdgeCase Nov 08 '13 at 17:19
  • Yeah if I did it this way I would have to write an entire class to parse and format the string dates to my liking!? Very time consuming but seems it may be my only choice? – k-prog Nov 08 '13 at 17:34

4 Answers4

1

How about something like this?

private String dateLongStringConvert(String dateLongString) {

    // split long date string into string array
    String[] dateArray = dateLongString.split(" ");

    // get day of month as an integer (strip out non numeric chars)
    int dayOfMonth = Integer.parseInt(dateArray[0].replaceAll("\\D+", ""));

    // Convert month string to number
    String month = "";
    switch (dateArray[1]) {
        case "January":
            month = "01";
        case "Feburary":
            month = "02";            
        case "March":
            month = "03";            
        case "April":
            month = "04";            
        case "May":
            month = "05";            
        case "June":
            month = "06";            
        case "July":
            month = "07";            
         case "August":
            month = "08";           
          case "September":
            month = "09";          
        case "October":
            month = "10";            
        case "Novemember":
            month = "11";
        case "December":
            month = "12";                
    }
    // return formated date string
    return dateArray[2] + "-" + month + "-" + String.format("%02d", dayOfMonth);
}
wyoskibum
  • 1,869
  • 2
  • 23
  • 43
  • This worked but I had to make a couple of changes. I had to remove the switch(String) because switch statement on String objects is a new feature introduced in Java 1.7. Unfortunately Android requires version 1.6 or 1.5. Instead I used multiple if else statements. Also when split the string into an array had to split("\\s+") to identify the white spaces. Thanks very much for your help! – k-prog Nov 12 '13 at 15:37
1
String inputDate = "8th November 2013";
inputDate = inputDate.replaceAll("([0-9])st|nd|rd|th|\\.", "$1"); // get rid of the th.
Date date = new SimpleDateFormat("d MMM y", Locale.ENGLISH).parse(inputDate); // parse input date
String outputDate = new SimpleDateFormat("yyyy-MM-dd").format(date); // format to output date
Squ1sh
  • 27
  • 3
  • Tried this and it removes the th fine but when it tries to format the result I get the following exception: "Unparseable date: "8 November 2013" – k-prog Nov 12 '13 at 14:49
0

Proper way to do such thing is to use a parser like Stanford Temporal Tagger and figure out dates from the text. A nice GUI(http://nlp.stanford.edu:8080/sutime/process) is provided by the team to evaluate the tool

Karthik
  • 1,005
  • 8
  • 7
0

to_char( 'YYYY/MM/DD HH24:MI:ss')

user2956373
  • 23
  • 10