490

I have a string that has two single quotes in it, the ' character. In between the single quotes is the data I want.

How can I write a regex to extract "the data i want" from the following text?

mydata = "some string with 'the data i want' inside";
Templar
  • 1,843
  • 7
  • 29
  • 42
asdasd
  • 5,099
  • 2
  • 16
  • 7

14 Answers14

713

Assuming you want the part between single quotes, use this regular expression with a Matcher:

"'(.*?)'"

Example:

String mydata = "some string with 'the data i want' inside";
Pattern pattern = Pattern.compile("'(.*?)'");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find())
{
    System.out.println(matcher.group(1));
}

Result:

the data i want
holmis83
  • 15,922
  • 5
  • 82
  • 83
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 15
    damn .. i always forget about the non greedy modifier :( – Mihai Toader Jan 11 '11 at 20:28
  • 44
    replace the "if" with a "while" when you expect more than one occurences – OneWorld Aug 07 '12 at 16:25
  • 23
    mind that matcher.find() is needed for this code sample to work. failing to call this method will result in a "No match found" exception when matcher.group(1) is called. – rexford Jul 31 '14 at 14:03
  • 1
    If you want the first result it should be ".group(0)" and not ".group(1)". – mFontoura Jan 15 '15 at 19:29
  • 30
    @mFontoura group(0) would return the complete match with the outer ' '. group(1) returns what is in-between the ' ' without the ' ' themselves. – tagy22 Feb 19 '15 at 14:34
  • 4
    why does Mark use the question mark in this case? doesn't the .* match 0 or more anyway? So if there was an empty string between the two quotations it would match nonetheless? – Larry May 13 '15 at 12:25
  • 1
    This code works well, but in the result the delimiters (') is included. How to get the substring without the delimiters? – Giuseppe Bianco Jan 21 '16 at 12:31
  • 1
    This answer is slightly misleading, as the code provided returns `'the data i want'` instead of `the data i want`. If you want to remove the single quotes you should print `matcher.group(1)` instead. – Boo Radley Apr 26 '16 at 16:29
  • 1
    @BooRadley The answer was correct from the beginning, I made rollback. – holmis83 May 06 '16 at 11:09
  • 6
    @Larry this is a late reply, but ? in this case is non-greedy modifier, so that for `this 'is' my 'data' with quotes` it would stop early and return `is` instead of matching as many characters as possible and return `is' my 'data`, which is the default behavior. – Timekiller Sep 12 '16 at 14:08
82

You don't need regex for this.

Add apache commons lang to your project (http://commons.apache.org/proper/commons-lang/), then use:

String dataYouWant = StringUtils.substringBetween(mydata, "'");
Yang
  • 7,712
  • 9
  • 48
  • 65
Beothorn
  • 1,289
  • 10
  • 19
  • 14
    You have to take into account how your software will be distributed. If it is something like a webstart it's not wise to add Apache commons only to use this one functionality. But maybe it isn't. Besides Apache commons has a lot more to offer. Even tough it's good to know regex, you have to be carefull on when to use it. Regex can be really hard to read, write and debug. Given some context using this could be the better solution. – Beothorn Apr 13 '15 at 14:41
  • 4
    Sometimes StringUtils is already there, in those cases this solution is much cleaner and readable. – Gábor Nagy Sep 14 '16 at 11:58
  • 9
    Its like buying a car to travel 5 miles (when you are traveling only once in a year). – prayagupa Mar 01 '17 at 20:38
  • 1
    While substring looks for a specific string or value, regex looks for a format. It's more and more dynamic. You need regex, if you are looking for a pattern instead of a special value. – burak Sep 19 '17 at 10:20
20
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile(".*'([^']*)'.*");
        String mydata = "some string with 'the data i want' inside";

        Matcher matcher = pattern.matcher(mydata);
        if(matcher.matches()) {
            System.out.println(matcher.group(1));
        }

    }
}
Sean McEligot
  • 307
  • 1
  • 4
  • 3
    System.out.println(matcher.group(0)); <--- Zero based index – nclord May 13 '16 at 14:49
  • 6
    No. group(0) has special meaning, capturing groups start at index group(1) (i.e. group(1) is correct in the answer). "Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern" - Source: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#group-int- – Apriori Apr 18 '17 at 06:48
  • 2
    Bear in mind that `matches()` tries to match entire string, so if you don't have ".*" at the beginning and end of your pattern, it won't find anything. – oneturkmen Mar 09 '21 at 17:42
18

There's a simple one-liner for this:

String target = myData.replaceAll("[^']*(?:'(.*?)')?.*", "$1");

By making the matching group optional, this also caters for quotes not being found by returning a blank in that case.

See live demo.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
15

Since Java 9

As of this version, you can use a new method Matcher::results with no args that is able to comfortably return Stream<MatchResult> where MatchResult represents the result of a match operation and offers to read matched groups and more (this class is known since Java 1.5).

String string = "Some string with 'the data I want' inside and 'another data I want'.";

Pattern pattern = Pattern.compile("'(.*?)'");
pattern.matcher(string)
       .results()                       // Stream<MatchResult>
       .map(mr -> mr.group(1))          // Stream<String> - the 1st group of each result
       .forEach(System.out::println);   // print them out (or process in other way...)

The code snippet above results in:

the data I want
another data I want

The biggest advantage is in the ease of usage when one or more results is available compared to the procedural if (matcher.find()) and while (matcher.find()) checks and processing.

Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
11

Because you also ticked Scala, a solution without regex which easily deals with multiple quoted strings:

val text = "some string with 'the data i want' inside 'and even more data'"
text.split("'").zipWithIndex.filter(_._2 % 2 != 0).map(_._1)

res: Array[java.lang.String] = Array(the data i want, and even more data)
Debilski
  • 66,976
  • 12
  • 110
  • 133
  • 4
    So readable solution, thats why people love scala I belive :) – prayagupa Mar 01 '17 at 20:42
  • 3
    Why not just `.split('\'').get(2)` or something to that extent in Java? I think you may need to get a brain scan if you think that's a readable solution - it looks like someone was trying to do some code golf to me. – ArtOfWarfare Apr 10 '17 at 17:05
9
String dataIWant = mydata.replaceFirst(".*'(.*?)'.*", "$1");
ZehnVon12
  • 4,026
  • 3
  • 19
  • 23
3

as in javascript:

mydata.match(/'([^']+)'/)[1]

the actual regexp is: /'([^']+)'/

if you use the non greedy modifier (as per another post) it's like this:

mydata.match(/'(.*?)'/)[1]

it is cleaner.

Mihai Toader
  • 12,041
  • 1
  • 29
  • 33
2

String dataIWant = mydata.split("'")[1];

See Live Demo

ZehnVon12
  • 4,026
  • 3
  • 19
  • 23
2

Apache Commons Lang provides a host of helper utilities for the java.lang API, most notably String manipulation methods. In your case, the start and end substrings are the same, so just call the following function.

StringUtils.substringBetween(String str, String tag)

Gets the String that is nested in between two instances of the same String.

If the start and the end substrings are different then use the following overloaded method.

StringUtils.substringBetween(String str, String open, String close)

Gets the String that is nested in between two Strings.

If you want all instances of the matching substrings, then use,

StringUtils.substringsBetween(String str, String open, String close)

Searches a String for substrings delimited by a start and end tag, returning all matching substrings in an array.

For the example in question to get all instances of the matching substring

String[] results = StringUtils.substringsBetween(mydata, "'", "'");
Memin
  • 3,788
  • 30
  • 31
1

In Scala,

val ticks = "'([^']*)'".r

ticks findFirstIn mydata match {
    case Some(ticks(inside)) => println(inside)
    case _ => println("nothing")
}

for (ticks(inside) <- ticks findAllIn mydata) println(inside) // multiple matches

val Some(ticks(inside)) = ticks findFirstIn mydata // may throw exception

val ticks = ".*'([^']*)'.*".r    
val ticks(inside) = mydata // safe, shorter, only gets the first set of ticks
Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
1

add apache.commons dependency on your pom.xml

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-io</artifactId>
    <version>1.3.2</version>
</dependency>

And below code works.

StringUtils.substringBetween(String mydata, String "'", String "'")
Ganesh
  • 677
  • 8
  • 11
0

you can use this i use while loop to store all matches substring in the array if you use

if (matcher.find()) { System.out.println(matcher.group(1)); }

you will get on matches substring so you can use this to get all matches substring

Matcher m = Pattern.compile("[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+").matcher(text);
   // Matcher  mat = pattern.matcher(text);
    ArrayList<String>matchesEmail = new ArrayList<>();
        while (m.find()){
            String s = m.group();
            if(!matchesEmail.contains(s))
                matchesEmail.add(s);
        }

    Log.d(TAG, "emails: "+matchesEmail);
Noah Mohamed
  • 114
  • 1
  • 1
  • 8
0

Some how the group(1) didnt work for me. I used group(0) to find the url version.

Pattern urlVersionPattern = Pattern.compile("\\/v[0-9][a-z]{0,1}\\/");
Matcher m = urlVersionPattern.matcher(url);
if (m.find()) { 
    return StringUtils.substringBetween(m.group(0), "/", "/");
}
return "v0";
Arindam
  • 675
  • 8
  • 15