3

I have such string:

<a href="https://host-test.com/create?userName=test3&amp;user-mail=myemail@gmail.com&amp;id=14b72820-3855-4f2b-9a39-543ced6784a0&amp;downloadurl=https://host-test.com:443/123/rest/tmp-z7vvymo3wmfzke/vfs/v2/downloadzip/&amp;projectid=d29ya3NwYWNleXFpYXlwZjgwb2sxNDA2MjovY3JlYXRlQWNj:createAcc;" style="font-family:Myriad Pro,arial,tahoma,serif;color:#fff;font-size:14px;text-decoration:none;font-weight:bold" title="Confirm tenant creation" target="_blank">
                            <div style="font-family:'Lucida Grande',sans-serif;border-radius:5px;width:120px;min-height:40px;line-height:40px;border:1px solid #577e15;color:#fff;text-align:center;background:#e77431;margin:15px 0 15px">
                                Confirm
                            </div>
                        </a>

and I need extract using regexp only href value:

https://host-test.com/create?userName=test3&amp;user-mail=myemail@gmail.com&amp;id=14b72820-3855-4f2b-9a39-543ced6784a0&amp;downloadurl=https://host-test.com:443/123/rest/tmp-z7vvymo3wmfzke/vfs/v2/downloadzip/&amp;projectid=d29ya3NwYWNleXFpYXlwZjgwb2sxNDA2MjovY3JlYXRlQWNj:createAcc;

also href value each time can be different shorter or longer

Roman Iuvshin
  • 1,872
  • 10
  • 24
  • 40
  • 1
    `I need extract using regexp only` just to make sure, you cant use any parsers? – Pshemo Aug 15 '13 at 18:19
  • 5
    I think I [read somewhere](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) that you can't parse HTML with RegEx or you'd unleash hell on earth or something. – Mike Christensen Aug 15 '13 at 18:20
  • no, I know how to do this with parsers.. – Roman Iuvshin Aug 15 '13 at 18:20
  • @MikeChristensen I prefer [the less subjective article](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html). – Bernhard Barker Aug 15 '13 at 18:24

2 Answers2

4
myString.replaceFirst(myString, "^<a\\s+href\\s*=\\s*\"([^\"]+)\".*", , "$1");

assuming myString contains your string with the a element.

As the href attributes cannot be nested, this should be fine and no full HTML parser is needed. A restriction is that it will only find href attributes in double quotes.

FrankPl
  • 13,205
  • 2
  • 14
  • 40
0

For this particular string you can try something like

Pattern pattern = Pattern.compile("<a\\shref=\"([^\"]+)");
//or if you cant use group numbers use look-behind mechanism like
//Pattern.compile("(?<=<a\\shref=\")[^\"]+");
Matcher matcher = pattern.matcher(yourString);
if (matcher.find())
    System.out.println(matcher.group(1));

but if your string can change (like href atrubute can have other atributes before it) it can not work as expected. That is one of the reasons to use parsers rather then regex.

Pshemo
  • 122,468
  • 25
  • 185
  • 269