5

I'm currently trying to solve a problem from codingbat.com with regular expressions.

I'm new to this, so step-by-step explanations would be appreciated. I could solve this with String methods relatively easily, but I am trying to use regular expressions.

Here is the prompt: Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.

wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"

etc

My code thus far:

String regex = ".?" + word+ ".?";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);

String newStr = "";
while(m.find())
    newStr += m.group().replace(word, "");

return newStr;

The problem is that when there are multiple instances of word in a row, the program misses the character preceding the word because m.find() progresses beyond it.

For example: wordEnds("abc1xyz1i1j", "1") should return "cxziij", but my method returns "cxzij", not repeating the "i"

I would appreciate a non-messy solution with an explanation I can apply to other general regex problems.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
Rishi
  • 945
  • 2
  • 15
  • 23
  • See this answer about look-around regular expressions http://stackoverflow.com/a/2995621/324900 – Reddy Nov 03 '12 at 19:14
  • @user1796994 See my undeleted, repaired answer for a one-line solution – Bohemian Nov 04 '12 at 00:51
  • @user1796994 See my (edited) answer for how to do it in just one line (including test code). You may not consider it "non-messy", but it's sure less messy than a many-line solution IMHO. – Bohemian Nov 04 '12 at 11:15

3 Answers3

1

This is a one-liner solution:

String wordEnds = input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");

This matches your edge case as a look ahead within a non-capturing group, then matches the usual (consuming) case.

Note that your requirements don't require iteration, only your question title assumes it's necessary, which it isn't.

Note also that to be absolutely safe, you should escape all characters in word in case any of them are special "regex" characters, so if you can't guarantee that, you need to use Pattern.quote(word) instead of word.

Here's a test of the usual case and the edge case, showing it works:

public static String wordEnds(String input, String word) {
    word = Pattern.quote(word); // add this line to be 100% safe
    return input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
}

public static void main(String[] args) {
    System.out.println(wordEnds("abcXY123XYijk", "XY"));
    System.out.println(wordEnds("abc1xyz1i1j", "1"));
}

Output:

c13i
cxziij
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • This isn't quite right - I'm going to come back to this later – Bohemian Nov 03 '12 at 19:44
  • @Bohemian that is incorrect he needs `cxziij` as output not `cxzi`..that is the reason y i had used lookarounds... – Anirudha Nov 04 '12 at 07:41
  • @Fake.It.Til.U.Make.It Although I previously stated this *wasn't* a solution, I have figured out the regex that actually (really) works - see edited answer for a *fully working* one line solution. – Bohemian Nov 04 '12 at 11:12
0

Use positive lookbehind and postive lookahead which are zero-width assertions

(?<=(.)|^)1(?=(.)|$)
    ^     ^     ^-looks for a character after 1 and captures it in group2
    |     |->matches 1..you can replace it with any word
    |
    |->looks for a character just before 1 and captures it in group 1..this is zero width assertion that doesn't move forward to match.it is just a test and thus allow us to capture the values

$1 and $2 contains your value..Go on finding till the end

So this should be like

String s1 = "abcXY123XYiXYjk";
String s2 = java.util.regex.Pattern.quote("XY");
String s3 = "";
String r = "(?<=(.)|^)"+s2+"(?=(.)|$)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s1);
while(m.find()) s3 += m.group(1)+m.group(2);
//s3 now contains c13iij

works here

Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • 4
    -1 Waaaaaay too complicated, and actually wrong. You don't need look arounds! Just use `(.)` - he says "don't match if there isn't a character", but you're over acheiving by matching start and end, which is actually *not* what the OP says he wants – Bohemian Nov 03 '12 at 19:44
  • @Bohemian I liked your original answer because of its simplicity, so I would appreciate if you could post that (with str.replace) – Rishi Nov 03 '12 at 19:58
  • @Bohemian can u tell even a single case where this regex would fail – Anirudha Nov 03 '12 at 20:08
  • No, but just becasue it compiles and executes correctly does **NOT** make it "good code". Let me put it bluntly... this is the regex you should use: `(.)1(.)`. Google "KISS principle" – Bohemian Nov 03 '12 at 22:35
  • @user1796994 I will post it again, but corrected - it isn't quite right and I need a little time to work on it, and I'm quite busy ATM. Hwoever, I will get back to it. There's definitly a one-liner that will work. – Bohemian Nov 03 '12 at 22:37
  • 1
    @Fake.It.Til.U.Make.It I've taken my prosaic and feel much better now - I've removed my -1. That was a bit harsh. My criticism stands though that's it's too complicated. – Bohemian Nov 04 '12 at 00:38
  • 1
    @Bohemian How is it too complicated? Matching start and end is needed for cases like 'aaaX', and look arounds are needed for cases like 'aXaXa'. Removing them will stop the edge cases from being handled correctly. – fgb Nov 04 '12 at 01:35
0

Use regex as follows:

Matcher m = Pattern.compile("(.|)" + Pattern.quote(b) + "(?=(.?))").matcher(a);
for (int i = 1; m.find(); c += m.group(1) + m.group(2), i++);

Check this demo.

Ωmega
  • 42,614
  • 34
  • 134
  • 203