How to find a specific word in HTML using Java?

Question

I am having a bit of some trouble with this. I tried using Regex but Regex does not give me the exact number of matches I need. I know how many matches I am suppose to have.

I am trying to find all occurrences of a specific word on a .txt file that is just an html page in text.

The problem is, the word I am searching for can be an id, class, or just in text on the website so I need to scrape the entire website for the word.

Also, with Regex, if the word was 'car' Regex was matching it with 'racecar', for example.

I looked into https://jsoup.org/ and is that the best way to go.

Just so I am clear, I watch my method to find, in this example, 'dog', twice in this piece of HTML

<p id="Dog">The dog went for a walk today.</p>

I hope I am clear - this might even be able to be done with Regex but I could have doing it incorrectly. I was using Pattern and using my pattern as \\bwordToBeSearchedFor\\b

"this might even be able to be done with Regex" [No it can't](https://stackoverflow.com/a/1732454/1898563). — Michael, Feb 19 '19 at 12:36
What is wrong with regex actually, i.e. pattern like `(?m)(?u)\W(car)\W` and html like ` car
A racecar and just a car
` — Victor Gubin, Feb 19 '19 at 13:10
Check it [your self](http://myregexp.com?regex=%5CW(car)%5CW&text=%3Chtml%3E%0A%09%3Chead%3E%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%3Ctitle%3Ecar%3C%2Ftitle%3E%0A%20%20%20%20%20%3C%2Fhead%3E%0A%20%20%20%20%20%3Cbody%3E%0A%20%20%20%20%20%20%20%20%20%20%20%3Cp%20id%3D%22car%22%20class%3D%22car%22%3EA%20racecar%20and%20just%20a%20car%3C%2Fp%3E%0A%20%20%20%20%20%20%3C%2Fbody%3E%0A%3C%2Fhtml%3E&flags=) — Victor Gubin, Feb 19 '19 at 13:13
If all you want is to count the number of exact matches then using regular expression should be sufficient. But the real problem here is that your question is not very clear, to me it looks like you're asking for us to recommend a tool to use which is considered an off-topic question. — Joakim Danielson, Feb 19 '19 at 13:36
Yeah that’s true, Joakim. I just want to be able to get exact matches from the entire string but Regex hasn’t been helpful for me. — Darnold14, Feb 19 '19 at 13:38

How to find a specific word in HTML using Java?

0 Answers0