Is there a default character set used by HTML forms? Or is there a default accept-charset attribute that is used?
We're experiencing some problems with characters and character sets in our online forms.
The HTML pages are set to use the character set ISO-8859-1 (using a content
meta tag), but there is no specific accept-charset
attribute set in the forms.
The databases in the back end use UTF-8 encoding.
I'm not sure why there are two different character sets used here - that decision was a bit before my time, and can't be easily changed.
Most of the time, everything runs quite happily. The problem comes when someone enters a character that's not contained in the ISO-8859 character set - it displays correctly in the browser, but comes through to the back end as an unknown entity. Really bizarrely, it then transfers back to the browser correctly.
I've assumed so far that even if a user enters a character into the form that's not in the ISO-8859 charset, the page will use the character set from the meta
tag when sending the data to the server; causing the odd entity to be displayed in the database. Does this sound like a feasible explanation, and - if so - would changing the content type of the HTML pages be a reasonable solution to the problem?
Cheers.