-1

I want to parse a string like this - "The time is 12-jan-20 12:11:32.1112 AM"

I used this regex to group the timestamp = (?s)^.?(?:\s|^)?(?<time>(?<!\d)(?:0?\d[012]?)(?:-|:|\.)[0-5]\d(?:-|:|\.)[0-5]\d.*(?:a|A|p|P)(?:m|M))

Refer - https://regex101.com/r/IQxXsJ/8

The group 'time' contains 12:11:32.1112 AM. But I don't want the millisecond value(.1112) in the group.

I want to group the string as 12:11:32 AM

Is it possible to exclude it while grouping with regex?

Selva
  • 25
  • 6
  • 1
    You could make it 2 capturing groups and use those in the replacement https://regex101.com/r/p11MMn/1 – The fourth bird Jan 22 '20 at 15:59
  • I could not use replacements in my case, Is it possible with the regex itself? – Selva Jan 22 '20 at 16:02
  • 1
    Would it not be easier to pass the Date/Time string to a Date/Time formatter/parser in the target programming language? – WJS Jan 22 '20 at 16:06
  • @WJS yes it is a bit difficult one. – Selva Jan 22 '20 at 16:08
  • You could also use the 2 groups instead of replacing the same line. Can you add the tool or language tag to the question? – The fourth bird Jan 22 '20 at 16:11
  • @Thefourthbird I am maintaining it as a standard format in my environment. That's why I want the 'time' group as like as the structure I mentioned in the question – Selva Jan 22 '20 at 16:13
  • 1
    It might be easier to come up with multiple regex expressions that are mutually exclusive and address the possible outcomes. Then try and match on each one until a match is found, then grab the captured values. You could even read in the regex expressions from a data base or separately compile class to allow for future additions without having to change the rest of the code. – WJS Jan 22 '20 at 16:48
  • 1
    See this answer to a similar question, @Selva: https://stackoverflow.com/a/277565/12689629 – Zaelin Goodman Jan 22 '20 at 19:16
  • Does this answer your question? [Regular expression to skip character in capture group](https://stackoverflow.com/questions/277547/regular-expression-to-skip-character-in-capture-group) – Zaelin Goodman Jan 22 '20 at 19:32
  • Yes, @ZaelinGoodman, Thanks – Selva Jan 23 '20 at 00:04

1 Answers1

1

You can do it like this in Java.

        String text = "The time is 12-jan-20 12:11:32.1112 AM";

        text = text.replaceAll(".* (\\d+:\\d+:\\d+).*(..)$", "$1 $2");

        System.out.println(text); // prints    12:11:32 AM
  1. Capture the first occurrence of the time separated by colons.
  2. Capture the last two characters before the end of the string.
  3. Replace the entire string with the back-references ($1 $2) of the captured components.

replaceAll is java specific but most regex engines may capture this way. You don't need to parse the entire string. Just look for the pieces you need and capture them ignoring everything else.

It also works as follows:

   String[] testStrings = {
       "The time is 12-jan-20 12:11:32.1112 AM",
       "hello 12:34:12 AM",
       "The time is now 3:16:01                PM",
    };

    for (String test : testStrings) {
       String result  = test.replaceAll(".* (\\d+:\\d+:\\d+).*(..)$", "$1 $2");
       System.out.println(result);
    }

Prints

12:11:32 AM
12:34:12 AM
3:16:01 PM 

WJS
  • 36,363
  • 4
  • 24
  • 39
  • It is a good idea but in my case, there are a lot of strings to be parsed with regex. So I cannot change the code just for this string alone. Hope you understand. – Selva Jan 22 '20 at 16:19
  • This is not string specific. It is format specific. You can apply that regex to any string where the date and time match that similar format. And according to your attempted regex, it would appear to do so. – WJS Jan 22 '20 at 16:21
  • I updated my answer. I do not understand why this would not work for you. – WJS Jan 22 '20 at 16:28
  • 1
    This will work. But I don't want to make changes in code just for this single format. Because there are tons of strings parsed with that regex. but those strings don't have the millisecond. So I am trying to find a way to match the string just by modifying the regex. – Selva Jan 22 '20 at 16:33