0

I have a pool of html files and want to search through them for same targeted text. It is required to search in their text contents only while ignoring all html tags, header, script, etc.

I tried QRegExp, the regex class in Qt, but could not find a good pattern to do what I'm after.

I’d appreciate any help in this regard.

Thank you.

Fargo
  • 21
  • 1

1 Answers1

0

This may or may not be a good answer for you, but have you considered using a DOM-parser instead? That will eliminate the need to filter out what is text and what is HTML markup. Sadly I can't recommend a good one for C++ though.

korona
  • 2,308
  • 1
  • 22
  • 37