-1

I have a big string and I wanna take links from that string. I can print link.

 Pattern pattern = Pattern.compile(".*(?<=overlay-link\" href=\").*?(?=\">).*");

with that code. Example output:

<a title="TITLE" class="overlay-link" href="LINK HERE"></a>

when I try string.replaceAll, regex deleting link and printing another variables.

EX: <a title="TITLE" class="overlay-link" href=""></a>

I am new on regex. Can you help me?

Here is full code :

String content;    
Pattern pattern = Pattern.compile(".*(?<=overlay-link\" href=\").*?(?=\">).*");

try {
    Scanner scanner = new Scanner(new File("sourceCode.txt"));
    while (scanner.hasNext()) {
        content = scanner.nextLine();
        if (pattern.matcher(content).matches()) {      
            System.out.println(content.replaceAll("(?<=overlay-link\" href=\").*?(?=\">)", ""));
        }
    }
} catch (IOException ex) {
    Logger.getLogger(SourceCodeExample.class.getName()).log(Level.SEVERE, null, ex);
}
Gaëtan Maisse
  • 12,208
  • 9
  • 44
  • 47
John W.
  • 3
  • 2
  • 1
    Don’t use regular expressions to parse XML or HTML. See http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg – VGR Mar 02 '17 at 16:19
  • but I have to use regular expression – John W. Mar 02 '17 at 18:59

1 Answers1

0

If I understand your question correctly you are looking to pull out just the link specified in the href tag.

To do this you should use a capture group in your regex itself instead of trying to replaceAll.

The replaceAll method is accurately finding the link and replacing it with an empty string and returning the full resulting string as per the docs which is not the desired result.

The regex you should use is as such: .*(?<=overlay-link\" href=\")(.*?)(?=\">).* Notice the capture group () around the link.

This will allow you to find the matches and access the capture group 1. I found a good example of how to do this in this other question. (important snippet pasted below)

String line = "This order was placed for QT3000! OK?"; //<a> tag string
Pattern pattern = Pattern.compile("(.*?)(\\d+)(.*)"); //insert regex provided above
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
    System.out.println("group 1: " + matcher.group(1)); //This will be your link
    System.out.println("group 2: " + matcher.group(2));
    System.out.println("group 3: " + matcher.group(3));
}

Comments added by me

Note: index 0 represents the whole Pattern

Community
  • 1
  • 1
jjspace
  • 178
  • 2
  • 11