Questions tagged [character-set]

A character set maps a set of characters to specific numeric values, e.g. ASCII, UTF-8 and ISO-8859-1.

A character set maps a set of characters to specific numeric values.

Modern computer languages, editors and tools facilitate encoding and decoding of data between internal representations of data and specific character sets. Examples include ASCII, UTF-8 and ISO-8859-1.

Consideration should be given to using the appropriate character set for transmission and persistence of data, particularly text that can contain special characters (such as European languages like French or German) or be in a completely different script (such as Japanese) - see internationalisation (also referred to as i18n).

120 questions
1299
votes
9 answers

What's the difference between utf8_general_ci and utf8_unicode_ci?

Between utf8_general_ci and utf8_unicode_ci, are there any differences in terms of performance?
KahWee Teng
  • 13,658
  • 3
  • 21
  • 21
605
votes
21 answers

Best way to convert text files between character sets?

What is the fastest, easiest tool or method to convert text files between character sets? Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa. Everything goes: one-liners in your favorite scripting language, command-line tools…
Antti Kissaniemi
  • 18,944
  • 13
  • 54
  • 47
341
votes
4 answers

What does character set and collation mean exactly?

I can read the MySQL documentation and it's pretty clear. But, how does one decide which character set to use? On what data does collation have an effect? I'm asking for an explanation of the two and how to choose them.
Sander Versluys
  • 72,737
  • 23
  • 84
  • 91
30
votes
2 answers

About the "Character set" option in Visual Studio

I have an inquiry about the "Character set" option in Visual Studio. The Character Set options are: Not Set Use Unicode Character Set Use Multi-Byte Character Set I want to know what the difference between three options in Character…
Lion King
  • 32,851
  • 25
  • 81
  • 143
17
votes
2 answers

SQL Server: set character set (not collation)

How does one set the default character set for fields when creating tables in SQL Server? In MySQL one does this: CREATE TABLE tableName ( name VARCHAR(128) CHARACTER SET utf8 ) DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Note…
dotancohen
  • 30,064
  • 36
  • 138
  • 197
9
votes
2 answers

How to convert character set from ISO8859_1 to UTF8 in Firebird?

I have a database in Firebird 2.5 filled with data. I need to change the character set from UTF-8 to ISO8859_1, I tried: alter database default character set ISO8859_1 collation ES_ES But it doesn't work. How can I convert the character set?
Quiron
  • 340
  • 2
  • 12
8
votes
2 answers

Does using ASCII/Latin Charset speed up the database?

It would seem that using the ASCII charset for most fields and then specify utf8 only for the fields that need it would reduce the amount of I/O the database must perform by 100%. Anyone know if this is true? Update: The above was not really my…
mbalsam
  • 611
  • 1
  • 6
  • 16
5
votes
1 answer

Need some clarification about LC_COLLATE and LC_CTYPE

I have gone through the official postgres documentation to know about the LC_COLLATE and LC_TYPE. But, still I don't understand it correctly. Can anyone help me in understanding these concepts and impact of these, especially when we are trying to…
5
votes
2 answers

Can Excel Sort Differently Than Its Default U.S. Character Set?

My question is basically the opposite of THIS ONE (which had a database-based solution I can't use here). I use SAP, which sorts characters this way: 0-9, A-Z, _ but I'm downloading data into Excel and manipulating ranges dependent on correct SAP…
wiigame
  • 153
  • 7
4
votes
3 answers

vb.net character set

According to MSDN vb.net uses this extended character set. In my experience it actually uses this: What am I missing? Why does it say it uses the one and uses the other? Am I doing something wrong? Is there some sort of conversion tool to the…
Connor Albright
  • 723
  • 4
  • 13
  • 29
4
votes
2 answers

Determining ISO-8859-1 vs US-ASCII charset

I am trying to determine whether to use PrintWriter pw = new PrintWriter(outputFilename, "ISO-8859-1"); or PrintWriter pw = new PrintWriter(outputFilename, "US-ASCII"); I was reading All about character sets to determine the character set of an…
vikingsteve
  • 38,481
  • 23
  • 112
  • 156
3
votes
0 answers

Why is Degrees-symbol ° not printing on Mac OS machine, but does print OK on Windows 10

I have a short C++ program to calculate the wind-chill index given a temperature and wind speed. It's working fine on a Windows 10 machine and outputing exactly as it should. To print the degrees symbol ° I'm using static_cast
JMBaker
  • 502
  • 3
  • 7
3
votes
1 answer

Can individual tags override the Character Set in the Specific Character Set (0008,0005)

If I create a DICOM object with a basic single byte Specific Character Set like (0008,0005) = ISO_IR 100, can one of the tags use a different 2-byte Character set? For example can Patient Name (0010,0010) be encoded in Simplified Chinese (ISO 2022…
3
votes
2 answers

Why is there a need to add a '0' to indexes in order to access array values?

I am confused with this line: sum += a[s[i] - '0']; To give some context, this is the rest of the code: #include using namespace std; int main() { int a[5]; for (int i = 1; i <= 4; i++) cin >> a[i]; string s; …
Zachary
  • 31
  • 1
3
votes
2 answers

Setting character set on MySQL connection does not work

I am trying to set a very simple session connection variable on MySQL, but it doesn't do anything. The queries run below do not cause any errors, but the character set of the MySQL connection won't be changed. If I configure the default values for…
user2180613
  • 739
  • 6
  • 21
1
2 3 4 5 6 7 8