1

I have a statement which finds strings that contain one character, say P. This works when matching against a string delimited by no white space

e.g.

APAXA

Thr regex being ^[^P]*P[^P]*$

It picks this string out fine, however, what if I have a string

XPA  DREP EDS

What would be the regex to identify all strings in one line that match the condition (strings always seperated by some kind of white space - tab, space etc)?

e.g. how would I highlight XPA and DREP

I am using while(m.find()) to loop multiple times and System.out.println(m.group())

so m.group has to contain the entire string.

dr85
  • 733
  • 3
  • 13
  • 19

6 Answers6

2

Split it by whitespace and then check each token against your existing regex.

jzd
  • 23,473
  • 9
  • 54
  • 76
1

why must it be a an overly complicated regex?

String string = "XPA  DREP EDS";
String[] s = string.split("\\s+");
for( String str: s){
  if ( str.contains("P") ){
     System.out.println( str );
  }
}
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
0

you can try and use the \s pattern (match whitespace). Look at this regexp page for java.

hellatan
  • 3,517
  • 2
  • 29
  • 37
0
\b[^P\s]*P[^P\s]*\b

will match all words that contain exactly one P. Don't forget to double the backslashes when constructing your regex from a Java string.

Explanation:

\b      # Assert position at start/end of a word
[^P\s]* # Match any number of characters except P and whitespace
P       # Match a P
[^P\s]* # Match any number of characters except P and whitespace
\b      # Assert position at start/end of a word

Please note that \b doesn't match all word boundaries correctly when dealing with Unicode string (thanks tchrist for reminding me). If that is the case for you, you might want to replace the \bs with (don't look):

(?:(?<=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])(?![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])|(?<![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])(?=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]]))

(taken from this question's winning answer)

Community
  • 1
  • 1
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
0

Thr reex being ^[^P]P[^P]$

Such a regex finds only string containing exactly one P, which may or may not be what you want. I suppose you want .*P.* instead.

For finding all words containing at least one P you can use \\S+P\\S+, where \S stands for non-blank character. You may consider \w instead.

For finding all words containing exactly one P you can use [^\\sP]+P[^\\sP]+(?=\\s) which is more complicated. Here, \s stands for blank, [^abc] matches everything expect for abc, (?=...) is lookahead. Without the lookahead, you'd find in "APBPC" two "words": "APB" and "PC".

maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • You're wrong, or do you really mean the following is ascii? final String s = "Příliš žluťoučký kůň úpěl ďábelské ódy"; final Pattern p = Pattern.compile("\\S+l\\S+"); final Matcher m = p.matcher(s); while (m.find()) System.out.println(m.group());` – maaartinus Jan 20 '11 at 14:44
0

Try adding whitespace characters (\s) in your negated character classes, and you'll also want to remove the ^ and $ anchors:

[^P\s]*P[^P\s]*

or as a Java String literal:

"[^P\\s]*P[^P\\s]*"

Note that the above does not work on Unicode, only ASCII (as tchrist mentioned in the comments).

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • With the proviso that that’s only going to work on ASCII characters, not non-ASCII Unicode characters. – tchrist Jan 20 '11 at 14:38