11

I am using Tidy (with PHP5) with UTF8 input, output, and char encoding enabled. When I clean a string with an   in it, it is replacing it with an odd character. I've tried messing with the tidy config but nothing I try seems to work.

Before Tidy:

This is a test.  Why does this not work?

After Tidy:

This is a test. ▒Why does this not work?

I don't know what the character is, but I assume it has something to do with the encoding of the enteties in utf8. Any ideas as to how I can get tidy to just leave the   alone?

Slickrick12
  • 897
  • 1
  • 7
  • 21
  • no-break space is a different character than space in utf8 : http://www.utf8-chartable.de/ I guess you'll have to use `str_replace` before Tidy – IcanDivideBy0 Jul 12 '11 at 17:09
  • I need the ` ` in there though because without it, HTML won't render two spaces on the screen. – Slickrick12 Jul 12 '11 at 17:17
  • What about using ` ` instead of ` `. Maybe tidy's looking for it explicitly? – Brad Christie Jul 12 '11 at 17:55
  • I tried this, and it gives the same result. I think its trying to encode an actual non-breaking space character instead of leaving the entity alone. I would like tidy to just treat it like plain text, and ignore any conversion on the entity itself. – Slickrick12 Jul 12 '11 at 18:29

1 Answers1

15

Have you tried the preserve-entities config option?

gere
  • 1,600
  • 1
  • 12
  • 19