28

I'm trying to get the last result of a match without having to cycle through .find()

Here's my code:

String in = "num 123 num 1 num 698 num 19238 num 2134";
Pattern p = Pattern.compile("num ([0-9]+)");
Matcher m = p.matcher(in);

if (m.find()) {
     in = m.group(1);
}

That will give me the first result. How do I find the LAST match without cycling through a potentially huge list?

Vova Yatsyk
  • 3,245
  • 3
  • 20
  • 34
kireol
  • 703
  • 1
  • 9
  • 22
  • Can you be sure it's the last thing in the string? If so just use the end of line anchor $ `/(num ([0-9]+)$/`, however that translates into java. – NorthGuard Jun 20 '11 at 21:02
  • You could write a recursive method, but I doubt that it makes sense. – s106mo Jun 20 '11 at 21:06

11 Answers11

21

You could prepend .* to your regex, which will greedily consume all characters up to the last match:

import java.util.regex.*;

class Test {
  public static void main (String[] args) {
    String in = "num 123 num 1 num 698 num 19238 num 2134";
    Pattern p = Pattern.compile(".*num ([0-9]+)");
    Matcher m = p.matcher(in);
    if(m.find()) {
      System.out.println(m.group(1));
    }
  }
}

Prints:

2134

You could also reverse the string as well as change your regex to match the reverse instead:

import java.util.regex.*;

class Test {
  public static void main (String[] args) {
    String in = "num 123 num 1 num 698 num 19238 num 2134";
    Pattern p = Pattern.compile("([0-9]+) mun");
    Matcher m = p.matcher(new StringBuilder(in).reverse());
    if(m.find()) {
      System.out.println(new StringBuilder(m.group(1)).reverse());
    }
  }
}

But neither solution is better than just looping through all matches using while (m.find()), IMO.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • 3
    Yeah I think that's cheating :-). It would be extremely difficult to extend this to the general case. – Mark Peters Jun 20 '11 at 21:48
  • 3
    +1 for the second solution, but -1 for that abomination you started with. ;) – Alan Moore Jun 20 '11 at 22:28
  • 1
    The reason I dont want to loop through while(m.find()) is that I'm parsing HTML and have a lot of results. I'm trying to make my code as efficient as possible. My thoughts are that needlessly looping through an entire array just to get the last one would be slow. Shame on Javas regex for not containing the number of results. I'll give yours a try. – kireol Jun 21 '11 at 00:47
15

To get the last match even this works and not sure why this was not mentioned earlier:

String in = "num 123 num 1 num 698 num 19238 num 2134";
Pattern p = Pattern.compile("num '([0-9]+) ");
Matcher m = p.matcher(in);
if (m.find()) {
  in= m.group(m.groupCount());
}
araut
  • 576
  • 4
  • 11
  • You are right! The thread startet did not want information about the index, only the content. This looks like the real right answer. – KFleischer Jul 01 '13 at 10:18
  • @KFleischer are you sure this works? the regex doesn't make any sense relative to the input string – necromancer Aug 29 '14 at 00:59
  • @necromancer This was a while ago, so I just quickly thought about whats going on: The pattern used is the one the thread starter said works for him, finding the first match. The only change to the thread starters code was to use the number of findings to address the last group. This is simple and I believe it worked for me back in the day when I wrote my comment. – KFleischer Aug 30 '14 at 10:12
  • 6
    oh, by the way, i realize that you may have misunderstood the semantics of `m.groupCount()` -- it has nothing to do with how many matches were found. it is the count of how many groups are there in the regular expression. in your sample code it would always be 1 because there is just 1 group in your regular expression. – necromancer Aug 30 '14 at 10:39
  • 1
    @KFleischer i realize you are not the person who answered ;) this answer is bizarre actually. i plugged it into a main class and the value of `in` is `num 123 num 1 num 698 num 19238 num 2134`, lol :v – necromancer Aug 30 '14 at 10:41
  • This actually doesn't work as @necromancer pointed out, but I see where araut was going. Perhaps he was thinking of something like this, which does work: `String result2=null; for (int index = in.length()-1; result2==null&&index>=0;index--){ System.out.println("index is "+index); if (m.find(index)) { result2= m.group(1); System.out.println("result2 is "+result2); } }` – michaelok Aug 28 '15 at 23:22
6

Why not keep it simple?

in.replaceAll(".*[^\\d](\\d+).*", "$1")
Garrett Hall
  • 29,524
  • 10
  • 61
  • 76
  • 14
    Could you explain what it does? – KFleischer Jul 01 '13 at 10:14
  • replace pattern is also greedy, so it looks for: 'any symbols followed by non-digit symbol followed by any number of digits (this is our last number) followed by any symbols at the end (this wasn't asked but still useful) ' and replace it with the first group. The first group is the one in the brackets and it is our last number. – Vova Yatsyk Aug 09 '22 at 13:55
  • the solution will not work if the last number is at the start of the string: `123 num`. For the most general answer check negative lookahead. – Vova Yatsyk Aug 09 '22 at 13:59
3

Use negative lookahead:

String in = "num 123 num 1 num 698 num 19238 num 2134";
Pattern p = Pattern.compile("num (\\d+)(?!.*num \\d+)");
Matcher m = p.matcher(in);

if (m.find()) {
    in= m.group(1);
}

The regular expression reads as "num followed by one space and at least one digit without any (num followed by one space and at least one digit) at any point after it".

You can get even fancier by combining it with positive lookbehind:

String in = "num 123 num 1 num 698 num 19238 num 2134";
Pattern p = Pattern.compile("(?<=num )\\d+(?!.*num \\d+)");
Matcher m = p.matcher(in);

if (m.find()) {
    in = m.group();
}

That one reads as "at least one digit preceded by (num and one space) and not followed by (num followed by one space and at least one digit) at any point after it". That way you don't have to mess with grouping and worry about the potential IndexOutOfBoundsException thrown from Matcher.group(int).

dhalsim2
  • 936
  • 2
  • 12
  • 35
3

Java does not provide such a mechanism. The only thing I can suggest would be a binary search for the last index.

It would be something like this:

N = haystack.length();
if ( matcher.find(N/2) ) {
    recursively try right side
else
    recursively try left side

Edit

And here's code that does it since I found it to be an interesting problem:

import org.junit.Test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

import static org.junit.Assert.assertEquals;

public class RecursiveFind {
    @Test
    public void testFindLastIndexOf() {
        assertEquals(0, findLastIndexOf("abcdddddd", "abc"));
        assertEquals(1, findLastIndexOf("dabcdddddd", "abc"));
        assertEquals(4, findLastIndexOf("aaaaabc", "abc"));
        assertEquals(4, findLastIndexOf("aaaaabc", "a+b"));
        assertEquals(6, findLastIndexOf("aabcaaabc", "a+b"));
        assertEquals(2, findLastIndexOf("abcde", "c"));
        assertEquals(2, findLastIndexOf("abcdef", "c"));
        assertEquals(2, findLastIndexOf("abcd", "c"));
    }

    public static int findLastIndexOf(String haystack, String needle) {
        return findLastIndexOf(0, haystack.length(), Pattern.compile(needle).matcher(haystack));
    }

    private static int findLastIndexOf(int start, int end, Matcher m) {
        if ( start > end ) {
            return -1;
        }

        int pivot = ((end-start) / 2) + start;
        if ( m.find(pivot) ) {
            //recurse on right side
            return findLastIndexOfRecurse(end, m);
        } else if (m.find(start)) {
            //recurse on left side
            return findLastIndexOfRecurse(pivot, m);
        } else {
            //not found at all between start and end
            return -1;
        }
    }

    private static int findLastIndexOfRecurse(int end, Matcher m) {
        int foundIndex = m.start();
        int recurseIndex = findLastIndexOf(foundIndex + 1, end, m);
        if ( recurseIndex == -1 ) {
            return foundIndex;
        } else {
            return recurseIndex;
        }
    }

}

I haven't found a breaking test case yet.

Mark Peters
  • 80,126
  • 17
  • 159
  • 190
  • I found a corner case where it will not work: Make a pattern that consists of optional parts. If one part of the pattern falls on one side of the binary search, and the second on the other side, the search will find only a small part of the overall pattern. Your code is not finding the maximum match. – KFleischer Jul 01 '13 at 10:13
  • @KFleischer: Isn't that as desirable in this case? Shouldn't the last occurrence of `[a]+` in `aaaa` be at index 4, not at index 0? When you're searching for the last index of something, it seems reasonable to accept a minimal match if it results in a greater index. Maybe you could give a specific example if you think it's not desired behaviour. – Mark Peters Jul 02 '13 at 03:34
2

Java patterns are greedy by default, the following should do it.

    String in = "num 123 num 1 num 698 num 19238 num 2134";
    Pattern p = Pattern.compile( ".*num ([0-9]+).*$" );
    Matcher m = p.matcher( in );

    if ( m.matches() )
    {
        System.out.println( m.group( 1 ));
    }
krico
  • 5,723
  • 2
  • 25
  • 28
0

Regular expressions are greedy:

Matcher m=Pattern.compile(".*num '([0-9]+) ",Pattern.DOTALL).matcher("num 123 num 1 num 698 num 19238 num 2134");

will give you a Matcher for the last match, and you can apply it to most regexes by prepending ".*". Of course, if you can't use DOTALL, you might want to use (?:\d|\D) or something similar as your wildcard.

yingted
  • 9,996
  • 4
  • 23
  • 15
0

This seems like a more equally plausible approach.

    public class LastMatchTest {
        public static void main(String[] args) throws Exception {
            String target = "num 123 num 1 num 698 num 19238 num 2134";
            Pattern regex = Pattern.compile("(?:.*?num.*?(\\d+))+");
            Matcher regexMatcher = regex.matcher(target);

            if (regexMatcher.find()) {
                System.out.println(regexMatcher.group(1));
            }
        }
    }

The .*? is a reluctant match so it won't gobble up everything. The ?: forces a non-capturing group so the inner group is group 1. Matching multiples in a greedy fashion causes it to match across the entire string until all matches are exhausted leaving group 1 with the value of your last match.

Bradley M Handy
  • 603
  • 6
  • 15
0
String in = "num 123 num 1 num 698 num 19238 num 2134";  
Pattern p = Pattern.compile("num '([0-9]+) ");  
Matcher m = p.matcher(in);  
String result = "";

while (m.find())
{
     result = m.group(1);
}
Norman Seßler
  • 154
  • 1
  • 3
0

Compared to the currently accepted answer, this one does not blindly discard elements of the list using the".*" prefix. Instead, it uses "(element delimiter)*(element)" to pick out the last element using .group(2). See the function magic_last in code below.

To demonstrate the benefit of this approach I have also included a function to pick out the n-th element which is robust enough to accept a list that has fewer than n elements. See the function magic in code below.

Filtering out the "num " text and only getting the number is left as an exercise for the reader (just add an extra group around the digits pattern: ([0-9]+) and pick out group 4 instead of group 2).

package com.example;

import static java.lang.System.out;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Foo {

  public static void main (String [] args) {
    String element = "num [0-9]+";
    String delimiter = ", ";
    String input;
    input = "here is a num bro: num 001; hope you like it";
    magic_last(input, element, delimiter);
    magic(1, input, element, delimiter);
    magic(2, input, element, delimiter);
    magic(3, input, element, delimiter);
    input = "here are some nums bro: num 001, num 002, num 003, num 004, num 005, num 006; hope you like them";
    magic_last(input, element, delimiter);
    magic(1, input, element, delimiter);
    magic(2, input, element, delimiter);
    magic(3, input, element, delimiter);
    magic(4, input, element, delimiter);
    magic(5, input, element, delimiter);
    magic(6, input, element, delimiter);
    magic(7, input, element, delimiter);
    magic(8, input, element, delimiter);
  }

  public static void magic_last (String input, String element, String delimiter) {
    String regexp = "(" + element + delimiter + ")*(" + element + ")";
    Pattern pattern = Pattern.compile(regexp);
    Matcher matcher = pattern.matcher(input);
    if (matcher.find()) {
        out.println(matcher.group(2));
    }
  }

  public static void magic (int n, String input, String element, String delimiter) {
    String regexp = "(" + element + delimiter + "){0," + (n - 1) + "}(" + element + ")(" + delimiter + element + ")*";
    Pattern pattern = Pattern.compile(regexp);
    Matcher matcher = pattern.matcher(input);
    if (matcher.find()) {
        out.println(matcher.group(2));
    }
  }

}

Output:

num 001
num 001
num 001
num 001
num 006
num 001
num 002
num 003
num 004
num 005
num 006
num 006
num 006
necromancer
  • 23,916
  • 22
  • 68
  • 115
0

just use \Z - end of string mach

String in = "num 123 num 1 num 698 num 19238 num 2134";
Pattern p = Pattern.compile("num ([0-9]+)\\Z");
Matcher m = p.matcher(in);

if (m.find()) {
     in = m.group(1);
}