0

I am trying to parse a document that consists of many sections.

Each section begins with :[]: followed by blank space, followed by 1 or more characters (any characters), followed by a : a blank space and one or more characters (any characters).

Here's an example:

:[]: Abet1, Abetted34: Find the usage in table under section 1-CB-45: Or more info from the related section starting with PARTIE-DU-CORPS.
:[]: Ou est-ce que tu a mal: Tu as mal aux jambes: Find usage in section 145-TT-LA-TETE.

The token of interest from each section is everything from :[]: to the first occurrence of :. For example, in the first section, I am only interested in extracting: :[]: Abet1, Abetted34:


At first, I used the following pattern finder to extract the token from each section of the document but this extracted everything from the first occurrence of : to the last occurrence of : in the section:

"\\B:\\[\\]:.*:\\B"

If I change the pattern finder to the following to extract the token from :[]: to the first occurrence of :, I get no match:

"\\B:\\[\\]:\\s*.:{1}"

How would the regular expression that extracts what I want look like?

Janez Kuhar
  • 3,705
  • 4
  • 22
  • 45
Darvin
  • 148
  • 8
  • 1
    When you say that `:[]: _` (underscore is a space) should be followed by *any* character until the first `:`, you're negating yourself. Clearly, *any* character won't do since `:` is also a character. – Janez Kuhar Oct 09 '20 at 15:27
  • That's correct the ':' is also considered any character but I have tried so many variations and not sure how to exclude ':' from any characters. – Darvin Oct 09 '20 at 15:39

2 Answers2

3

This is what you want?

(?<=:[]: ).*?(?=:)

See more : https://regex101.com/r/jOmnSb/2

Or

:[]:.*?:

See more : https://regex101.com/r/jOmnSb/3

UPDATE :

You can convert regex to Java regex here : https://www.regexplanet.com/advanced/java/index.html

VietDD
  • 1,048
  • 2
  • 12
  • 12
3

So you want to match a string against:

  1. :[]:_ (where _ is a space character)
  2. followed by one or more characters that are not a : (refer to this question)
  3. close the match with a : character

The regex for that would be:

:\[\]: [^:]+:

You have to escape \ characters when converting the regex pattern to Java. You could do something like:

import java.util.regex.*; 
public class MatchTest {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile(":\\[\\]: [^:]+:", Pattern.CASE_INSENSITIVE);
        Matcher matcher =
            pattern.matcher(
                ":[]: Abet1, Abetted34: Find the usage in table under section 1-CB-45: Or more info from the related section starting with PARTIE-DU-CORPS.\n"
              + ":[]: Ou est-ce que tu a mal: Tu as mal aux jambes: Find usage in section 145-TT-LA-TETE."
            );
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }
}
Janez Kuhar
  • 3,705
  • 4
  • 22
  • 45
  • Thanks, this is what I wanted. When I tried this yesterday, instead of [^:]*:", I was using [^:].*:", and didn't know that '*' can be used alone without the '.' – Darvin Oct 09 '20 at 16:12