0

I'm trying to find all words beginning with h, but I need to exclude html tags, like within this search. I have the code to find all the words starting with h:

\h\w+

I just don't know how to exclude things within my search specifically an html tag.

djderik
  • 9
  • 5
  • 2
    why dont you remove the html tags first then apply your regex – Tobey Dec 09 '16 at 17:59
  • `\h` means? Do you mean words inside the tags? how are you getting your input? – depperm Dec 09 '16 at 18:02
  • \h means instances of h I believe plus the \w then searches for words starting with h. I'm trying to exclude specifically and . My input is a huge json file. – djderik Dec 09 '16 at 18:05
  • 2
    Please provide examples text and expected output – cmidi Dec 09 '16 at 18:06
  • Possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Daniel Pryden Dec 09 '16 at 19:13

1 Answers1

0

Use de exclude character [^]

[^<]h\w+ 

But i think this way may work better for what you want, since it generates a match for every word beginning with h that's not a

 (?!<)h\w+

Even better, do the following match:

 ((?!<)h\w+)

(close attention, there is a blank space just before the first ( )

If the text is:

html teste homem carro agharro hzete h

It will do a full match with " homem" and " hzete", being the first match groups the word you want. "homem","hzete".

I would recomend you a graphical regex validation tool, so you see live the expressions you are writing. A good one is https://regex101.com/

Hope this helps.

Samamba
  • 113
  • 9