0

I have a string:

Qstring text = "<a href="/GPWIS2/pl/emitents/news/4FUNMEDIA,PL4FNMD00013,1,current,1,1;jsessionid=vD8S3MVOLWcx-Cg2ecHBojDy.undefined">4Fun Media SA</a>"

I'd like to cut tag <a...> but it dosen't work. I'm trying do something like this:

text.remove("<a.*>"); I don't know why it dosen't work.

Krzysztof Michalski
  • 791
  • 1
  • 9
  • 25
  • 1
    Try `text.remove("");` – Wiktor Stribiżew Aug 19 '15 at 18:55
  • 1
    [Did someone say "parse HTML with regular expressions"](http://stackoverflow.com/a/1732454/1620671)? – Philipp Aug 19 '15 at 19:21
  • You don't want to do that. You seem to be receiving data from the web, and you definitely want to use a DOM to safely parse that and ensure that you have a modicum of success in light of the data provider changing things cosmetically without changing the underlying structure. If the data provider provides XML, use the wonderful `QXMLStreamReader` to parse it. If the data is HTML but not XML, use the Qt Webkit Bridge and traverse the DOM using `QWebElement` & co. If you insist on regexes: TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ. – Kuba hasn't forgotten Monica Aug 19 '15 at 19:26
  • @KubaOber it's fun and all to copy the style of popular SO answers but perhaps, when trying to teach someone about using the DOM it's probably not the best idea to have scrambled text in a comment where it's much harder to read. – d0nut Aug 19 '15 at 19:51

1 Answers1

1

It doesn't work because .* is greedy and will try to take every character it possibly can in the match. In this case, it'll match everything up until the last ...< /a>. and then match the last >

Try this: <a.*?>

.*? is the lazy version of .* which will only match the minimum number of characters needed to make the match successful. In this case, the first > it encounters, right before the contents of the a tag.

Additionally, if you want to also remove the </a> then you should try this instead: <\/?a.*?>

\/ will match < / a> and the ? means it's optional to have so it will still match the first <a> tag. .*? won't affect the second match since .*? can match 0 characters (it is lazy after all!)

Regex101

d0nut
  • 2,835
  • 1
  • 18
  • 23