HTML form, character sets, and the accept-charset attribute

Question

Is there a default character set used by HTML forms? Or is there a default accept-charset attribute that is used?

We're experiencing some problems with characters and character sets in our online forms.

The HTML pages are set to use the character set ISO-8859-1 (using a content meta tag), but there is no specific accept-charset attribute set in the forms.

The databases in the back end use UTF-8 encoding.

I'm not sure why there are two different character sets used here - that decision was a bit before my time, and can't be easily changed.

Most of the time, everything runs quite happily. The problem comes when someone enters a character that's not contained in the ISO-8859 character set - it displays correctly in the browser, but comes through to the back end as an unknown entity. Really bizarrely, it then transfers back to the browser correctly.

I've assumed so far that even if a user enters a character into the form that's not in the ISO-8859 charset, the page will use the character set from the meta tag when sending the data to the server; causing the odd entity to be displayed in the database. Does this sound like a feasible explanation, and - if so - would changing the content type of the HTML pages be a reasonable solution to the problem?

Cheers.

alex · Accepted Answer · 2011-09-28T11:27:03.243

2

Browsers will send the text from inputs in the same charset as the page is served. accept-charset can cause problems, if you use it, make sure it has the same charset as your page.

The reason it's an unknown entity is because your database is treating it as UTF-8. But when it comes back to the page, it's just bytes, this time treated as ISO-8859.

However, it may cause problems if you are using any of your database's string functions on the text if it is treating it as UTF-8.

edited Sep 28 '11 at 11:27

answered Sep 28 '11 at 11:18

alex

479,566
201
878
984

Oh yeah - that makes sense. So - if we were to alter the character in the database to the correct UTF-8 code, then we might have a problem with it showing up incorrectly in the web page? – nihilogist Sep 28 '11 at 11:24
@nihilogist Try setting your database's charset to ISO-8859 and see what happens. – alex Sep 28 '11 at 11:28
Sadly I don't think that'll be an option on the live database, but I'll give it a stab on the old testing one and see what we can find. Thanks! – nihilogist Sep 28 '11 at 12:04

HTML form, character sets, and the accept-charset attribute

1 Answers1

Linked