I am trying to extract the text from an HTML page without using additional packages as it is actually a part of a cs course assignment. I am trying to write a method which omits any text between a '<' and a '>' and return anything that remains. I have a well-working method which extracts all page source and that method is on the parent class of the child class which I am currently working with.
public String getUnfilteredPageContents() {
String last = "";
String rawHTML = this.getPageContents();
for(int i=0; i<rawHTML.length(); i++) {
last = last + rawHTML.charAt(i);
if(rawHTML.charAt(i) != '<') {
while(rawHTML.charAt(i) != '>') {
i++;
}
}
}
return last;
}
Any help will be appreciated. Thank you in advance.