I want to write a script that checks a document for keywords and specifies html document nodes in which they are contained (possibly assign a unique identifier).
I am not a professional programmer and do not know the strength of low-level languages and things as PLO.. I'm afraid of doing something very bad and unsupported.
How is it possible to isolate the desired nodes?
My experience - js and php - php only for very simple things. Also, I do not want to use the opportunity to work with js nodes. My thoughts:
- to make a string of html
- verify the existence of the words on the page
- if the word on page exists: foreach node in body element I get first and last positions (for example, we see opening tag for each character we initially know position and therefore we calculate the first position where the tag is opened and last where closed. And so on for all nodes).
We know the position of the word (eg 192, 199) and check in what range it got (in this case, these bands - nodes html document).
I need ideas from experienced programmers. It does not matter what language you are programming (except for web-oriented)- every opinion is important to me. It is likely that there are libraries that solve such problems. I very much hope that you will understand me. English is not my native language.