0

I have for example field with value

String a="Items:#1000#,#2000#";

for which I developed the logic to get values 1000 and 2000 successfully.

Pattern p = Pattern.compile("\\#(.*?)\\#");
Matcher m = p.matcher(a);

while(m.find()){
    System.out.println(m.group(1));
}

It works OK!!!

But I have issue with some values which should not be take into account with only one # sign and after that double ## signs. For example:

 String a="Items:#1 #1000#,#2000#";

This value 1 should not be taken into consideration!!!! But my code returns in this case 1 and , which is not good it should return again 1000 and 2000

Is this possible somehow to ignore the value with just one #? Unfortunately I have many values with one # before the double ## signs? Values are always separated with ## and coma

StackFlowed
  • 6,664
  • 1
  • 29
  • 45
Veljko
  • 1,708
  • 12
  • 40
  • 80
  • Is there a way for a computer to differentiate whether it should choose `1 ` or `1000`? Both are between two `#`. For instance, should always the last option be chosen? Or maybe the value should consist only of digits? – Ghostkeeper Oct 17 '14 at 13:52
  • 1
    Hi, thanks for efforts. Values will be at the end always...First value practically is not between two # because it has blank space between if that is something which can help you – Veljko Oct 17 '14 at 13:59
  • Since you mentioned that they are separated by #value#, you could use something like Pattern.compile("\\#(.*?)\\#,"); – StackFlowed Oct 17 '14 at 14:00
  • @wrongAnswer no it does not work it returns in that this result: 1 #1000 – Veljko Oct 17 '14 at 14:03
  • @Dejan check if that part contains # in it that means you have to use the part after that. or you could also change it to Pattern.compile(",\\#(.*?)\\#,"); but you would be skipping the first element. You would get 2000 in this case. – StackFlowed Oct 17 '14 at 14:05

4 Answers4

1

The following pattern will only match on strings surrounded by # and followed by a comma or the end of the string:

Pattern p = Pattern.compile("#([^#]*)#(?=(,|$))");

You can add more characters to the final bracket (after ?=) if you wish to match on, for instance, newline characters too.

I didn't test it in Java, only in Notepad++, but both use the same regex algorithm.

Ghostkeeper
  • 2,830
  • 1
  • 15
  • 27
1

This could be solved in many ways, it really depends on how static the format of your data is. Given the example you listed, you could just change your regex to:

Pattern p = Pattern.compile("\\#(\\S*?)\\#");

Basically that just specifies that the groupings cannot have spaces.

user3062946
  • 682
  • 1
  • 6
  • 17
  • Thank you very much for your help, it works good only thing I am not sure if maybe sometimes user will manually add valid value but with space so I will use ("#([^#]*)#(?=(,|$))") – Veljko Oct 17 '14 at 14:15
1

Solution:

This will match only what you want to match. With regular expressions simpler is always better!

#([^,\s]+)# - Explaination

Proof:

    final Pattern p = Pattern.compile("#([^,\\s]+)#");
    final Matcher m = p.matcher("Items:#1 #1000#,#2000#");
    while (m.find())
    {
        System.out.println("m.group(1) = " + m.group(1));
    }

Outputs:

m.group(1) = 1000
m.group(1) = 2000
  • the code works I tested it, explain why you think this is wrong, it isn't. –  Oct 17 '14 at 14:15
  • You're assuming there will always be a space or comma after the unwanted `#` (and before the first wanted one). I don't think that's a valid assumption. – Alan Moore Oct 17 '14 at 14:17
  • Thank you it works now... When I tried it previously I had some issue.. only one question.. what if user accidentally entered blank space in valid value : for example #1 #1000#,#2000 # this will return only 1000... do you have maybe solution and for this? this is something which maybe can be very rear case – Veljko Oct 17 '14 at 14:18
  • you can't program for every accident, that is an anti-pattern called [Defensive Programming](https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=why%20defensive%20programming%20is%20bad) and no matter what anyone argues differently it is a path to ruin. Erlang is able to provide Nine 9s uptime, that is `99.9999999%` uptime without any defensive programming, matter of fact it is actually hard to write defensive programming in functional languages for the most part. Instead of detecting errors you have sophisticated recovery! –  Oct 17 '14 at 16:20
  • [Relevant link that would not fit in the previous comment!](http://stackoverflow.com/a/8427032/177800). The entire *"be liberal in what you accept and strict in what you produce"* is crap as well, that is why we have the html/css/dom/browser fiasco of incompatibilties that we have today! Garbage in is Garbage no matter what. Trying to detect it and fix it just incourages it. Back in the day a 9 track tape didn't load into a mainframe, you got it back with a terse error code on a sticky note. They didn't try and fix the data, it was wrong they rejected it, you fixed your code to be correct. –  Oct 17 '14 at 16:22
-2

this #([^#]*)#(?=(,|$)) assumes the groups end with , or end of line. if you have a space after (or other character) it won't get that element.

if you always have digits(no spaces) between the # you can use \\#(\\d*?)\\# to solve this #1 #1000#,#2000#, but not this #1#1000#,#2000#

cfrick
  • 35,203
  • 6
  • 56
  • 68
acostil
  • 119
  • 2