30

What's the difference between uppercase and titlecase. Frankly, I never heard of titlecase before.

In java there are seperate methods for both:

  • Character.isTitleCase(char)
  • Character.isUpperCase(char)

Some websites define it as follows:

TitleCase: Matches characters that combine an uppercase letter with a lowercase letter, such as Nj and Dz

But there must be more to it: the isTitleCase(char) method only accepts 1 character. So - if this was the case - then this method would need at least 2 characters.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
bvdb
  • 22,839
  • 10
  • 110
  • 123

3 Answers3

25

It accepts only one Unicode character. It turns out that DŽ actually is only one character: look how it shows in monospaced font: DŽ. The titlecase version is Dž and a lowercase version dž exists as well.

ILMTitan
  • 10,751
  • 3
  • 30
  • 46
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
  • 1
    wow, I had no idea! I guess this is something that is used in specific languages ? (not in English or French right ?) – bvdb Aug 02 '15 at 11:30
  • The closest situation I can think of is œ and æ, but they behave differently: http://www.fileformat.info/info/unicode/char/0153/index.htm – Glorfindel Aug 02 '15 at 12:11
  • 4
    @bvdb The reason this happens is for round-trip compatibility with legacy encodings in which such things occurred. For example, in the MacRoman encoding, byte 0xDE maps to U+FB01 `LATIN SMALL LIGATURE FI` (so a **fi** character) and byte 0xDF maps to U+FB02 `LATIN SMALL LIGATURE FL` (a **fl** character). Round-trip guarantees allow you to losslessly convert from MacRoman to Unicode and back to MacRoman again without anything changing. – tchrist Aug 29 '15 at 20:19
4

I know it's already been answered before but I'm just adding a really quick breakdown:

Combined characters:

  • DŽ = Uppercase Only
  • dž = Lowercase Only
  • Dž = Titlecase Only

Single characters:

  • D = Uppercase AND Titlecase
  • d = Lowercase Only
nxasdf
  • 1,088
  • 1
  • 11
  • 11
3

WHAT IS TITLECASE:

  • in some languages and scripts, there are digraph letters - i.e. a single Unicode code point which is a combination of 2 human-readable characters, displayed as a kind of a combination glyph.

  • only digraphs can be titlecase - i.e. the lowercase digraph of "dz" corresponds to uppercase "DZ" and titlecase "Dz".

  • so, "UPPERCASE", "Titlecase" and "lowercase"

It accepts only one Unicode character.

Not exactly correct.

Greek language has lots of titlecase glyphs, and there are also more Latin titlecases then "DZ".

To view all titlecase characters in the world, start Excel (or the free Power BI Desktop app), then Data/Get Data/Blank Query, and execute the following Power Query M language query by copy-pasting it to Query/Advanced Editor:

let
 downloaded = Web.Contents("https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt"),
 csv = Csv.Document(downloaded,
                    [Delimiter=";",
                      Encoding=65001, // UTF-8
                      QuoteStyle=QuoteStyle.None // allow line breaks within the quoted string
                    ]),
    #"Removed Other Columns" = Table.SelectColumns(csv,{"Column1", "Column2", "Column3"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Character code"}, {"Column2", "Character name"}, {"Column3", "Category"}}),
    #"Added Custom" = Table.AddColumn(#"Renamed Columns", "Glyph", each Character.FromNumber(Expression.Evaluate("0x" & [Character code]))),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Character code", "Glyph", "Character name", "Category"}),
    #"Filtered Rows" = Table.SelectRows(#"Reordered Columns", each [Category] = "Lt")
in
    #"Filtered Rows"
Max
  • 31
  • 1
  • what is the titlecase of 'ij'? – Ṃųỻịgǻňạcểơửṩ Oct 28 '21 at 18:20
  • 1
    @Ṃųỻịgǻňạcểơửṩ, any case of "ij" does not exist as a single Unicode character. You could just type it as 2 separate characters "Ij" for the desired effect but only the single letter "I" can be considered titlecase, not "Ij" combined because they're 2 seperate characters. – nxasdf Nov 08 '21 at 18:13