1

Can someone help me out with a regex to match a string which starts with the following eg: The string can begin with any html tag eg: < span > or < p > etc so basically I want a regex to check if a string begins with any opening html tag <> and then followed by [apple videoID=

Eg:

<span>[apple videoID= 

Here's what I've tried :

static String pattern =  "^<[^>]+>[apple videoID=";
static Pattern pattern1 = Pattern.compile(pattern);

What is wrong in the above?

VLAZ
  • 26,331
  • 9
  • 49
  • 67

4 Answers4

2

You have a typo in the following line.

static String pattern = "^<[^>]+>[apple videoID=";

This string is not a valid regular expression because you have an unclosed [ right before the word apple, hence the "Unclosed character class" PatternSyntaxException. You either meant to type

static String pattern = "^<[^>]+><apple videoID=";

assuming that apple is an html tag, or

static String pattern = "^<[^>]+>\\[apple videoID=";

if you really did want the [ in front of apple. This is because [ is a special character in regular expressions and must be escaped with a \ which is a special character in Java strings and must be escaped with a \. Therefore \\[.

Mark
  • 3,057
  • 1
  • 14
  • 5
0

simple as this:

<[.]+><apple videoID=[.]*
Alexandru Severin
  • 6,021
  • 11
  • 48
  • 71
0

Here is the solution

Pattern.CASE_INSENSITIVE helps to fetch the pattern either in upper case or lower case.

Tested and Executed.

    package sireesh.yarlagadda;

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

    public class Pattern {

        public static void main(String[] args) {
            String text="<span><apple videoID=";

            String patternString = "<[a-zA-Z]*>\\<apple videoID=";

            Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
            Matcher matcher = pattern.matcher(text);

            System.out.println("lookingAt = " + matcher.lookingAt());
            System.out.println("matches   = " + matcher.matches());
        }

    }
aliteralmind
  • 19,847
  • 17
  • 77
  • 108
Sireesh Yarlagadda
  • 12,978
  • 3
  • 74
  • 76
  • Oh I guess you got me wrong. I don't want to consider any html between the apple tag. I want to match the string which begins with 1 html tag. It could be span or p etc .. . 1 html tag followed by " – user3524469 Apr 11 '14 at 17:02
  • All the second asterisk does (`>*`) is match 0+ `>`s. [As you can see](http://regex101.com/r/rH9kH7), that will cause unintended consequences. – Sam Apr 11 '14 at 17:02
  • 1
    Changed the solution to fit the requirement. Removed asterisk. Case sensitive is added. @Sam – Sireesh Yarlagadda Apr 11 '14 at 17:19
  • what's the point of `[a-zA-Z]` if you are using `CASE_INSENSITIVE` ? – njzk2 Apr 11 '14 at 19:49
  • You are right. It even works with [a-z] . Just to play safe. @njzk2 – Sireesh Yarlagadda Apr 11 '14 at 19:52
0

Try this pattern :

"^<[A-Za-z]+>\\[apple videoID=$"

This pattern will match [apple videoID=

Hope this will help you..!

Naresh Ravlani
  • 1,600
  • 13
  • 28