-1

I want to match in a html code all Text. But only text with all punctuation characters, but without html like or urls etc.

example:

<div class="description">Boys loving girls</div>

match result:

Boys loving girls

example:

<div class="description">
guys loving girls! 
</div><br />

match result:

guys loving girls!

my try:

(?!.*(?:http:\/\/))^[a-z0-9():+,\-.@;\$_\!*\'%\?\säüöß%]+
Tunaki
  • 132,869
  • 46
  • 340
  • 423
Nicy
  • 11
  • 4

1 Answers1

0

Please read How do you parse and process HTML/XML in PHP? to learn more about parsing HTML content.

You should not use regex for this kind of task.


If you want to use regex anyway, then try the following regex pattern:

$pattern = '/^(?!.*(?:https?|ftp):\/\/)(?:[^>]*>|)\s*([^<]+)(?:<.*|)\s*$/';
Community
  • 1
  • 1
Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • isnt threre a simple regex solution possible? like exclude: <.*> or so? – Nicy Jun 25 '12 at 23:02
  • sry want work because there could be also simple text without html like see edited example at first post – Nicy Jun 25 '12 at 23:09
  • looks good. but the url exclude is mising ive tried this (?!.*(?:https?:\/\/))(?:>\s*(.*?)\s*<|^\s*([^><]*?)\s*$) but not all urls are excluded hmm – Nicy Jun 25 '12 at 23:20
  • ok when i disable"match at line breaks ^$" then it not working with normal text without html tags
    etc...
    – Nicy Jun 26 '12 at 15:07
  • 1
    @Nicy - please see the updated answer and please clean up (remove) the comments above to make this post clear. – Ωmega Jun 26 '12 at 15:45