0

I'm a novice at java and I'm working on a project that scans the source code of a website, and extracts all the hyperlinks contained in it. So far I have my project working so that it scans every 'word' of the source code using a Scanner (in.next()) However Ive been told to use delimiters to extract the hyperlinks from this, but I can barely find any information out there to help me use them! Someone couldnt help explain to me delimiters and how I could use them in this project? It would be really appreciated.

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Scanner;
import java.util.ArrayList;


public class HyperlinkMain {
public static void main(String[] args) {
    try {
        Scanner in = new Scanner (System.in);
        String URL = in.next();

        URL website = new URL(URL);
        Scanner inWebsite = new Scanner (website.openStream());

        String inputLine; 

        while ((inWebsite.hasNext())) {
            // Process each 'word'.
            System.out.println(inWebsite.next());

        }
        in.close(); 

    } catch (MalformedURLException me) {
        System.out.println(me); 

    } catch (IOException ioe) {
        System.out.println(ioe);
    }
}
}

1 Answers1

0

You could use Regular expression on strings. Below is an existing Stack Overflow on this topic.

How to use regular expressions to parse HTML in Java?

Community
  • 1
  • 1
Joe
  • 1,219
  • 8
  • 13