-1

enter image description here

I'm trying to do my homework but regex is new for me and I'm not sure why my code doesn't work. That's what I have to do:

Write a program that replaces in a HTML document given as string all the tags <a href=…>…</a> with corresponding tags [URL href=…]…[/URL]. Read an input, until you receive “end” command. Print the result on the console.

I wrote:

Pattern pattern = Pattern.compile("<a href=\"(.)+\">(.)+<\\/a>");
input = input.replaceAll(matcher.toString(), "href=" + matcher.group(1) + "]" + matcher.group(2) + "[/URL]");

And it throws Exception in thread "main" java.lang.IllegalStateException:

No match found for this input: href="http://softuni.bg">SoftUni</a>
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
  • 6
    For bonus points, you should provide your teacher with a link to [this SO answer on parsing HTML with regex](http://stackoverflow.com/a/1732454/1678362) – Aaron Apr 29 '16 at 16:21

3 Answers3

0

You were heading in the right direction, but you can't use a Pattern object like that.

First, change you code to use replaceAll() just with strings directly and use normal back references $n in the replacement string.

Your code thus converted is:

input = input.replaceAll("<a href=(\".+\")>(.)+<\\/a>", "href=$1]$2[/URL]");

Next, fix the expressions:

input = input.replaceAll("<a href=(\".+\")>(.+)</a>", "[URL href=$1]$2[/URL]");

The changes were to put the + inside the capturing group. ie (.)+ -> (.+) and also to capture the double quotes, since you have to put them back if I interpret the "spec" correctly.

Also note that you don't need to escape a forward slash. Forward slashes are just plain old characters in all regex flavors. Although some languages use forward slashes to delimit regular expressions, java isn't one of them.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

Your + quantifer needs to be inside the parentheses:

<a href=\"(.+)\">(.+)<\\/a>
Scott Weaver
  • 7,192
  • 2
  • 31
  • 43
0
 using System;
 using System.Collections.Generic;
 using System.Linq;
 using System.Text;
 using System.Text.RegularExpressions;
 using System.Threading.Tasks;

 namespace _06.Replace_a_Tag
 {
    class Program
     {
       static void Main(string[] args)
        {
            string text = Console.ReadLine();
            while (text != "end")
            {
                string pattern = @"<a.*?href.*?=(.*)>(.*?)<\/a>";
                // is used to take only 2 groups :
                // first group (or group one) is used for the domain name
                // for example : "https://stackoverflow.com"

                // and the second is for if you want to enter some text 
                // (or no text)
                // for example : This is some text

                string replace = @"[URL href=$1]$2[/URL]";
                // we use $ char and a number (like placeholders)
                // for example : $1 means take whatever you find from group 1
                //        and  : $2 means take whatever you find from group 2

                string replaced = Regex.Replace(text, pattern , replace);

                //  In a specific input string (text), replaces all strings 
                //  that match a specified regular expression (pattern ) with 
                //  a specified replacement string (replace)

                Console.WriteLine(replaced);

                text = Console.ReadLine();
            }


       }
    }
  }

  //  input : <ul><li><ahref=""></a></li></ul>
  //  output: <ul><li>[URL href=""][/URL]</li></ul>