I have a string like that (it's an empty paragraph) saved from my heavily edited and after-processed input from TinyMCE.
That is how it looks like after echo, in HTML source code in browser:
<p> </p>
Now, I need to remove those empty paragraphs.
I have already tried
$output = str_ireplace("<p> </p>", "", $string);
$output = preg_replace("/<p> <\/p>/", "", $string);
$output = preg_replace("/<p>[ \t\n\r]*<\/p>/", "", $string);
$output = preg_replace("/<p>[\s]*<\/p>/", "", $string);
and many more variations with no luck. It's still there, intact. I have also tried mb_ereg_replace and matching
which isn't apparently the case.
On the other hand, this works:
$output = preg_replace("/<p>.*<\/p>/", "", $string);
but of course striping also paragraphs with actual content.
What else could that "space-like" character be? How am I supposed to match it?
SOLVED Thanks to Ibizaman and this thread link, I've found the character. It is nbsp in unicode value. See http://unicodelookup.com/#160/1
This works:
$output = preg_replace("/<p>[\x{00A0}\s]*<\/p>/u", "", $string);
As pointed by mcrumley, this might work even better:
"/<p>[\p{Zs}\s]*<\/p>/iu"
[^a-zA-Z0-9]*<\/p>/" should do it, although it's maybe too restrictive. The rationale is that a ^ at the beginning of the brackets negates it.
– ibizaman Nov 20 '13 at 13:23[^<]*<\/p>`... Anyway, check the page source to be sure... I remember last time, a similar situation made me crazy :S
– Enissay Nov 20 '13 at 13:26\s*
#` should work. What's the exact output ? Could you give the hex value of those spaces ? A wild guess, try to use the `u` modifier `#\s*
#u` – HamZa Nov 20 '13 at 13:26[^a-zA-Z0-9]*<\/p>/" is a good idea, it works as a nice workaround, but I might have to enhance it a bit, thanks. At least something.
– Saix Nov 20 '13 at 13:38