2

I am trying to do some regex pattern matching in java in order to try and import values from a structure file with two distinct patterns.

I have a file that may look like this:

[Group Variable]
name = Value

[Valid Extensions]
images = {
jpeg
png
}

This file is a config file for a java program. I am using a modified version of the java code here: What is the easiest way to parse an INI file in Java?

This code lets me make specific requests for a variable name like name. (Therefor no need to save anything to the left of the equal sign.

The first pattern is simple, "Grab any content on the line after the equals sign". The regex for that is pretty simple: (\s*([^=]*)=(.*))

The second is a little more complicated "grab all content after the equals sign between the curly braces (i.e. to enclose elements of an array spread out across multiple rows)"

I have tried to find the text between two curly braces using a modification of (?<=\\{)(.*?)(?=\\})

I have tried to setup an if statement to ignore a line containing open curly brace like ([^\{]|^)* https://stackoverflow.com/a/1264575/4383447. From my reading regex will support if then else logic (?(?=regex)then|else) so

I haven't been able to get the regix for this or the combination of the two working. And it's preferred that I use a complicated regex expression capable of handling both cases rather than use iteration or recursion on the java side.

Interestingly some of my attempts seem to fail on the java side, and others while possible that they would have worked did not appear to work as tested by: https://regex101.com/r/aG1xO0/2 . A few of the attempts I still had recorded when I decided to post it as a question are below. I no longer have my efforts on if and or logic alternatives.

(\s*([^=]*)=\{)(.*?)(?=\})
(\s*([^=]*)=(?<=\{)(.*?)(?=\}))
\s*([^=]*)=(?(?=([^{]|^)(.*))(.*)|{([^}]*)})
\s*([^=]*)=(.*))|(\s*([^={*}]*)=\{)(.*?)(?=\})
Community
  • 1
  • 1
EngBIRD
  • 1,915
  • 3
  • 18
  • 22
  • Is your question about the second regex? Try [`String pat = "(?s)=(.*?)(?=\n\\[|$)"`](https://regex101.com/r/eQ5eC4/1) with `Matcher#find()`. – Wiktor Stribiżew Mar 26 '16 at 23:47
  • What results you expect here exactly? Should `name` and `images` be also part of your result? Or maybe you are interested only in `Value` and `{...}` part? – Pshemo Mar 26 '16 at 23:48

3 Answers3

2

Based on your description you may be looking for something like

Pattern p = Pattern.compile("=\\s*(\\{[^}]*\\}|.*)");
Matcher m = p.matcher(data);
while(m.find()){
    System.out.println(m.group(1));
    System.out.println("------");
}

DEMO

Explanation.

We are looking for some part which exists after = and optionally whitespaces. But we don't need that part so we can either

  1. use look-behind (?<=...)

or

  1. wrap needed part in capturing group.

Option 1 is impossible here because look-behind must have obvious maximum length which \s* (zero or more optional whitespaces) prevents. Which means we are left with option 2.
Now need to describe two cases which we are interested in. To do so we will use case1|case2 and we will put it in capturing group. To avoid situation where matching case1 will prevent matching case2 we need to write most specific case at start. Here it is regex representing area {.\n.\n.} because regex matching only one line {. could prevent us from matching rest of \n.\n.} part.

Now {...} can be represented as \\{[^}](\\}. [^}] means any non-} character which means we will be also able to match line separators. So it has advantage over .*? because we don't need to bother with making regex see . as all characters including line separators with Pattern.DOTALL flag. We also don't need to use reluctant quantifier *? which reduces performance a little because of backtracking.

Avoiding Pattern.DOTALL also has this advantage that we can write regex representing second case (rest of line after =) simply as .* because . will not be able to match line separators.


If you want to also include property name you could use ^([^=\n\r]+?)\s*=\s*(\{([^}]*)\}|.*) regex with MULTILINE flag (allowing ^ to represent start of each line, not only start of entire text).

DEMO 2

Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • Thanks, I updated my question to try and make it clearer. your DEMO was very informative, but I haven't successfully integrated into my code yet. It runs fine, but item extraction compared to the other field keys isn't quite working yet. I am investigating why that is, because if I understand `(?<==)\s*(\{[^}]*\}|.*)` correctly, it only applies to the `a={}` extraction. – EngBIRD Mar 27 '16 at 03:18
  • Oh, I forgot to update my code in linked demo. When I was writing explanation I thought that having `(?<==)` isn't really necessary and we replace it with simple `=` because we are simply using only part in capturing group. I updated it now. – Pshemo Mar 27 '16 at 03:51
  • @EngBIRD "*but item extraction compared to the other field keys isn't quite working yet*" can you give some example? As you see from demo it should work fine for text you provided in your question. "*I understand `(?<==)\s*(\{[^}]*\}|.*)` correctly, it only applies to the a={} extraction.*" `.*` part should also handle rest of line after `=` so it should handle both cases. – Pshemo Mar 27 '16 at 12:11
  • Thanks, I am still working on the java side, (silly me, used a buffered line reader which of course restricts the strings I am attempting to pattern match with...). Thanks for helping to fill in my understanding of the regex that should work, I didn't think it returned both sides because the online regex engine I linked to in my question didn't return results for both conditions but now that I test it more thoroughly, it is only returning the results of the first test, so I am optimistic now. – EngBIRD Mar 27 '16 at 21:27
  • I am not sure what problems you are facing now but maybe this demo will help you https://regex101.com/r/aG1xO0/4 (assuming you want to also include name of property) – Pshemo Mar 27 '16 at 23:54
  • I think this latest regex algorithm you included in your last link has done the trick! `^([^=\\n\\r]+?)\\s*=\\s*(\\{([^}]*)\\}|.*)`. I inserted a function to merge newlines between curly braces into the same one, and your regex expression now does the trick detecting both types of expressions. Can't say that I fully understand the new left hand side with the new lines and tabs... But It's working great so far! Thanks! – EngBIRD Mar 28 '16 at 16:44
0
\{([\w\n]*)\}

This extracts jpeg and png from the structure.

0

As not all the lines contain curly braces, I would recommend using two steps to split the String (so that you can still continue processing the original String if match for curly braces is not found).

Step 1 would be to extract Strings with your regex, and once we get the String, we can use the following to extract the content between curly braces:

String string = "fdwfs{aaaa}fsfds";
Pattern pattern = Pattern.compile("\\{(.*?)\\}");
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
    System.out.println(matcher.group(1));
}

It won't go into while if match is not found. In that case, we can process the whole String.

Darshan Mehta
  • 30,102
  • 11
  • 68
  • 102