4

I'm having a little bit of trouble making a sticky form that will remember what is entered in it on form submission if the value has double quotes. The problem is that the HTML is supposed to read something like:

<input type="text" name="something" value="Whatever value you entered" />

However, if the phrase: "How do I do this?" is typed in with quotes, the resulting HTML is similar to:

<input type="text" this?="" do="" i="" how="" value="" name="something"/>

How would I have to filter the double quotes? I've tried it with magic quotes on and off, I've used stripslashes and addslashes, but so far I haven't come across the right solution. What's the best way to get around this problem for PHP?

VirtuosiMedia
  • 52,016
  • 21
  • 93
  • 140

3 Answers3

13

You want htmlentities().

<input type="text" value="<?php echo htmlentities($myValue); ?>">

Greg
  • 316,276
  • 54
  • 369
  • 333
11

The above will encode all sorts of characters that have html entity code. I prefer to use:

htmlspecialchars($myValue, ENT_QUOTES, 'utf-8');

This will only encode:

'&' (ampersand) becomes '&amp;'
'"' (double quote) becomes '&quot;' when ENT_NOQUOTES is not set.
''' (single quote) becomes '&#039;' only when ENT_QUOTES is set.
'<' (less than) becomes '&lt;'
'>' (greater than) becomes '&gt;'

You could also do a strip_tags on the $myValue to remove html and php tags.

thesmart
  • 2,993
  • 2
  • 31
  • 34
  • Also, make sure your text encoding is UTF-8 for the above. You can usually omit that last parameter in htmlspecialchars if you'd like. – thesmart Nov 07 '08 at 23:34
  • you don't to specify charset http://stackoverflow.com/questions/6181299/let-htmlspecialchars-use-utf-8-as-default-charset/6206252#6206252 – dynamic Nov 27 '12 at 20:33
3

This is what I use:

htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE | ENT_DISALLOWED | ENT_HTML5, 'UTF-8')
  • ENT_QUOTES tells PHP to convert both single and double quotes, which I find desirable.
  • ENT_SUBSTITUTE and ENT_DISALLOWED deal with invalid Unicode. They're quite similar - as far as I understand, the first substitutes invalid code unit sequences, i.e. invalidly encoded characters or sequences that do not represent characters, while the second substitutes invalid code points for the given document type, i.e. characters which are not allowed for the document type specified (or the default if not explicitly specified). The documentation is undesirably laconic on them.
  • ENT_HTML5 is the document type I use. You can use a different one, but it should match your page doctype.
  • UTF-8 is the encoding of my document. I suggest that, unless you are absolutely sure you're using PHP 5.4.0, you explicitly specify the encoding - especially if you'll be dealing with non-English text. A host I do some work on uses 5.2.something, which defaults to ISO-8859-1 and produces gibberish.

As thesmart suggests, htmlspecialchars encodes only reserved HTML characters while htmlentities converts everything that has an HTML representation. In most contexts either will do the job. Here is a discussion on the subject.

One more thing: it is a best practice to keep magic quotes disabled since they give a false sense of security and are deprecated in 5.3.0 and removed from 5.4.0. If they are enabled, each quote in your fields will be prepended by a backslash on postback (and multiple postbacks will add more and more slashes). I see that the OP is able to change the setting, but for future references: if you are on a shared host or otherwise don't have access to php.ini, the easiest way is to add

php_flag magic_quotes_gpc Off

to the .htaccess file.

Community
  • 1
  • 1
mcmlxxxvi
  • 1,358
  • 1
  • 11
  • 21
  • ENT_HTML5 is redundant with htmlspecialchars. See http://stackoverflow.com/a/14532168/427545. Nice breakdown though, +1 – Lekensteyn Jan 26 '13 at 00:26