0

This is more of a theory question, than practical. Some basic input gets passed to my server. In general, I encode it, but I do not double encode it.

I was thinking about any problems that could arise from that decision. One, is if someone enters the following two strings on a form on my site:

Apples & Bananas
Apples & Bananas

These would respectively be stored as, due to single encoding:

Apples & Bananas
Apples & Bananas

If I were to output them, I would decode them before doing so. The user would then see:

Apples & Bananas
Apples & Bananas

The source would be:

Apples & Bananas
Apples & Bananas

Therefore, I will have lost some of the structure of the submission, since an entity would essentially be equal to it's encoded version, upon submission.

My instinct is that they should always be double encoded, but I would be curious to hear another opinion on this.

onassar
  • 3,313
  • 7
  • 36
  • 58
  • I can suggest - double encoding and having the right data stored is better than having bad data. You do not loose valid data, but bad data. Why mind? The user would be happy to have the corrected data. – RGV Jun 03 '13 at 23:17
  • I think you're using the wrong encoding method. If you use proper MySQL encoding then it should not change when read back from the database. – Reactgular Jun 03 '13 at 23:19
  • Also, if you use PDO objects there is no need to encode. – Reactgular Jun 03 '13 at 23:19
  • related: http://stackoverflow.com/questions/129677/whats-the-best-method-for-sanitizing-user-input-with-php?rq=1 – Reactgular Jun 03 '13 at 23:23
  • 2
    You should NEVER html encode the data that you are storing in your database, this is bad practice as if you want to use the data in another application at some point, it will all be HTML encoded. You should html encode the data as you render it at output time. – Geoffrey Jun 03 '13 at 23:52
  • I don't understand your original statement. If you do single encoding, `&` will be turned into `&` so you won't store the same thing in the database. – Barmar Jun 03 '13 at 23:57
  • @Barmar single encoding will leave `&` as `&`. It will only do the converting if you double encode. – onassar Jun 04 '13 at 00:08
  • 2
    I disagree. If you call `htmlentities("&")` it should return `"&"`. – Barmar Jun 04 '13 at 00:10
  • Apologies. You are right. I had a hardcoded `false` value for the `$double_encode` parameter. – onassar Jun 04 '13 at 00:38

1 Answers1

1

Generally, you should only apply those encodings to the data that are really necessary. In case of a MySQL string literal, it’s the escaping of the surrounding quote character and the escape character itself as well as some other characters.

However, the & is not a critical character in MySQL string literals and thus should not be encoded, especially not with an inappropriate encoding such as HTML character references. The HTML character references encoding would only be applied in case the data is output in HTML in a corresponding context in which plain text HTML special characters can result in a misinterpretation of user supplied data as author supplied data.

Now regarding the ‘double encoding’, if a user enters & I would want my application to display it as & and thus requiring it to encode it as &. So I would encode whatever comes in, no matter what the user’s intention was on entering it.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • Perfect thanks for the thorough opinion Gumbo. It's helped steer me in the right direction for my needs. – onassar Jun 04 '13 at 15:41