-1

I want to build an application that will receive user-defined Strings of unknown size and identify them as either being simple Strings or dates. What I want to do is to find a way to extract the date pattern of the String without knowing if the String that the program will receive will actually be a date.

More precisely, if the String that will be received is going to be

"2014-05-07_0533" //valid date

the program will return

"yyyy-MM-dd_HHmm"

But if the String that is going to be received is

"2014-05_667788" //not a valid date

the program will raise an Exception or otherwise inform the user that the supplied String does not follow a known date pattern, thus it's not a date. One way that I can think of implementing the previous problem is declaring a predefined List of available date patterns which the program will accept and then exhaustively trying to match the supplied String with each one of these patterns. When there is a match, the program will return the matched pattern. If none of the available patterns are matched, null or a message will be returned.

The problem with the above thought is that the Strings that the program will receive are going to be scaled up to tens or hundreds of thousands, so I'm starting to think that this method will have significant impact on the application's speed and overall memory footprint. Is there a better solution?

EDIT

There is no code so far, as I am in the early stages of development and I'm just running some ideas on how to implement it.

EDIT2

For those of you who request a code sample, this is what I have thought so far:

public class DateMatching {

    public List<String> datePatterns = new ArrayList<>();

    public DateMatching() {
        initializePatterns();
    }

    private void initializePatterns() {
        datePatterns.add("yyyy-MM-dd_HH:mm");
        datePatterns.add("yyyy/MM/dd_HH:mm");
        datePatterns.add("yyyy.MM.dd_HH:mm");
        //and so on...
    }

    public final String getDatePattern(String supplied) {

        DateFormat format;
        for(String datePattern : datePatterns) {
            format = new SimpleDateFormat(datePattern);
            try {
                format.parse(supplied);
                return datePattern;
            } catch(ParseException e) {
                continue;
            }
        }
        return null; //No matched pattern
    }
}

As the datePatterns list could contain up to 50 or more patterns and the Strings that the application will receive may be more than tens or hundreds of thousands, I'm trying to find a way to reduce the amount of time the matching process will require for all these Strings -assuming that there is one, to begin with.

Lefteris008
  • 899
  • 3
  • 10
  • 29
  • 1
    Check [*jchronic*](https://github.com/samtingleff/jchronic) (see [parser test](https://github.com/samtingleff/jchronic/blob/master/src/test/java/com/mdimension/jchronic/ParserTest.java)). SO thread link is [here](http://stackoverflow.com/questions/3850784/recognise-an-arbitrary-date-string). – Wiktor Stribiżew Jun 17 '16 at 08:20
  • How do the patterns differ between each other? So is it possible to implement a logic that the input gets always the same format? – wake-0 Jun 17 '16 at 08:22
  • @Sanjeev The question is more of a request to propose me an improvement than to supply me code. – Lefteris008 Jun 17 '16 at 08:22
  • @KevinWallis That's what I'm asking. Is it possible to get whichever pattern a String contains and automatically find if it could be a date? – Lefteris008 Jun 17 '16 at 08:23
  • 1
    Is there a "down-vote" bounty and you are rushing to just down vote any post that does not supply code without even reading the question of the OP? I thought that Stack Overflow was there to provide analytical answers to algorithmic questions too, not just correct the bugs of newbies on programming... – Lefteris008 Jun 17 '16 at 08:35
  • @Lefteris008 I would suggest to start with a simple approach (store the patterns in a list, check the input string against each pattern until one matches). If performance is good enough, move on - if it isn't you can start optimising your algo. – assylias Jun 17 '16 at 08:37
  • Can you show the full list of valid patterns? You could maybe use a `Map>` containing as keys regular expressions and as values the list of formats that match the regex. You would probably be able to have a small number of regex that cover all the possible patterns. That way you would be able to reject incorrect strings faster and only test the formats that "make sense" on valid strings. – assylias Jun 17 '16 at 09:20
  • @assylias 1/2 That's what I'm thinking too but I hesitated in writing it. Some general regular expressions that will reduce the time of matching. I don't have currently the full list of date patterns, I'm still designing the application. I just need to support as many as I can. – Lefteris008 Jun 17 '16 at 09:23
  • @assylias 2/2 On the other hand, even with the regular expression, the problem is not completely solved. I mean, I would just match a String as being a date but I won't be able to get the actual date pattern of it. For example, if I use a regular expression that would match 10 date formats, the regex would just match a valid String; there's no way in extracting the specific date pattern of it. – Lefteris008 Jun 17 '16 at 09:26
  • @Lefteris008 yup - but in the end, unless you parse the string manually (that's a whole project in itself, and certainly error-prone), you will have to test it against a number of patterns... The easy way would be to restrict the number of valid patterns - but you may not be able to do that. – assylias Jun 17 '16 at 09:55
  • @assylias That's correct and this is what I'm looking for but though this solves my initial problem, it actually breaks the purpose of the application: to return the specific pattern the date follows. So, I'm assuming that there is no way of implementing what I'm asking for; I will move to building the application as me and others suggested -that of exhaustively trying to match it against a number of pre-defined patterns. Thanks for your time. – Lefteris008 Jun 17 '16 at 11:51

2 Answers2

3

10s of thousands is not a huge amount.

I would just try to parse it and catch exceptions:

private static final DateTimeFormatter FMT =  DateTimeFormatter.ofPattern("yyyy-MM-dd_HHmm");
public static boolean isValidDate(String input) {
  try {
    FMT.parse(input);
    return true;
  } catch (DateTimeParseException e) {
    return false;
  }
}

Running the method 10,000 times takes less than 100 ms on my machine (without even allowing the JVM to warmup etc.).

assylias
  • 321,522
  • 82
  • 660
  • 783
  • Correct, that's what I have thought of so far but is there a different way to implement it rather that construct a relatively big `if-then-else` statement that will try to match the String with all the available date patterns? – Lefteris008 Jun 17 '16 at 08:26
  • @Lefteris008 create `Set` which contains all allowed patterns and check the given pattern against entry from the `Set` ... only a simple approach. – wake-0 Jun 17 '16 at 08:28
  • @KevinWallis I don't think that there's a point in using a `Set` over a `List` as I still need to iterate over all of the date patterns and try to match them one by one with the supplied String. But thanks anyway! – Lefteris008 Jun 17 '16 at 08:50
  • @Lefteris008 when you can find the pattern from the given input without a problem you can use the `Set` this was what I thought – wake-0 Jun 17 '16 at 08:52
  • 1
    @KevinWallis Yes, but to find the pattern in the first place, I have to exhaustively try to match the String with all the pre-defined date patterns I will declare. When I have a match, I would just return the matched pattern. – Lefteris008 Jun 17 '16 at 08:54
0

As the others have suggested, there is no way of implementing what I'm asking for. The application needs to know what are the patterns that is going to seek in the received String, ab initio; it cannot just magically guess that a String is a date without the prior knowledge of how a date is actually assembled. So, I will declare a List of predefined date patterns and every time a String arrives, I will try to match it against the aforementioned List. I am closing the issue, thanks for all the answers!

Lefteris008
  • 899
  • 3
  • 10
  • 29