0

I have an input string as below:

john is a StartDate 10\11\2012 EndDate 15\11\2012 john is a boy john is StartDate john

I want to extract the two dates StartDate and EndDate from above string.

However, I can not just search for word StartDate because as seen towards the end of the string, StartDate may come as an independent word. I can not take first instance because there is no gaurantee that StartDate word with the dates will always be first.

So solution would be to search for pattern StartDate % EndDate % together. i.e. both StartDate and EndDate words together.

What would be the best way to achieve this?

One solution I can think of is for every instance of word StartDate, take the substring of next four words (including StartDate) and search for word EndDate in that subString. If its exists, we have the correct substring else go for the next instance of StartDate word and repeat the task.

Vicky
  • 16,679
  • 54
  • 139
  • 232

3 Answers3

1

A quick and dirty way to extract with regex (replaceFirst):

String input = "john is a StartDate 10\\11\\2012 EndDate 15\\11\\2012 john is a boy john is StartDate john";

String startDate = input.replaceFirst(".*(StartDate \\d{1,2}\\\\\\d{1,2}\\\\\\d{4}).*", "$1");
String endDate = input.replaceFirst(".*(EndDate \\d{1,2}\\\\\\d{1,2}\\\\\\d{4}).*", "$1");

System.out.println(startDate);
System.out.println(endDate);

If you just want the dates only:

String startDate = input.replaceFirst(".*StartDate (\\d{1,2}\\\\\\d{1,2}\\\\\\d{4}).*", "$1");
String endDate = input.replaceFirst(".*EndDate (\\d{1,2}\\\\\\d{1,2}\\\\\\d{4}).*", "$1");
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • Please see my comment in Keppil's answer... For your answer too, I am not getting the correct date. The output I am getting is 10 ?2 – Vicky Jul 11 '12 at 06:30
  • @NikunjChauhan: I tested before posting this. It should work at least for the sample input your provide. If you want better answer, please provide more test cases. – nhahtdh Jul 11 '12 at 06:32
  • Your answer is correct. The issue I did not include escape characters in my input. I have to read input from a file where escape chars will not be present. I am checking for it. Thanks! – Vicky Jul 11 '12 at 06:36
0

Use regular expression to match the date.

Regex: .*?StartDate[ ]+(\d{2}\\\d{2}\\\d{4})[ ]+EndDate[ ]+(\d{2}\\\d{2}\\\d{4})).*

  • In the above regex first group matched is the start date, and the second group matched is the end date.

Refer the following link to know how to use regex in Java: http://docs.oracle.com/javase/tutorial/essential/regex/

18bytes
  • 5,951
  • 7
  • 42
  • 69
0

I would go for a simple regex, since your pattern is so well defined:

String input = "john is a StartDate 10\\11\\2012 EndDate 15\\11\\2012 john is a boy john is StartDate john";
Matcher matcher = Pattern.compile("StartDate (.*?) EndDate (.*?) ").matcher(input);
if (matcher.find()) {
  startDate = matcher.group(1);
  endDate = matcher.group(2);
}
Keppil
  • 45,603
  • 8
  • 97
  • 119
  • Its not giving proper output. The ouput I am getting for start date is: 10 ?2 – Vicky Jul 11 '12 at 06:28
  • @NikunjChauhan: That's because you haven't escaped your `\ `s in the input. You need to write like this: `String input = "john is a StartDate 10\\11\\2012 EndDate 15\\11\\2012 john is a boy john is StartDate john"`; – Keppil Jul 11 '12 at 06:29
  • The input will be coming in a file. And the file will have string as I have mentioned in my question. Without escape character. Let me recheck at my end if that will cause some issue. – Vicky Jul 11 '12 at 06:32
  • @NikunjChauhan: If the string comes that way from file you should be fine. You only need to escape your `\ `s when you declare the string yourself. – Keppil Jul 11 '12 at 06:35