I need to parse an HTML document and get all urls and content of page and save it to database.I don't want to use any library. I can identify link tags using <a
tag but how can I extract all content or useful text from html tag?
Asked
Active
Viewed 3,685 times
0
-
1Why don't you use any library? – Guy Feb 09 '20 at 08:12
-
Gotta agree with @Guy on this one. Why re-invent the wheel? – Turing85 Feb 09 '20 at 08:13
-
If you can't use a library. Copy paste everything it did? lmao – papaya Feb 09 '20 at 08:15
-
I am not allowed to use library – Scarlet Feb 09 '20 at 08:59
1 Answers
0
You can try this one: https://docs.oracle.com/javase/8/docs/api/javax/swing/text/html/parser/Parser.html
Sample of usage: How to extract info from HTML with Java's own Parser?

Alex Chernyshev
- 1,719
- 9
- 11