95

Is there a standard way in Python to titlecase a string (i.e. words start with uppercase characters, all remaining cased characters have lowercase) but leaving articles like and, in, and of lowercased?

martineau
  • 119,623
  • 25
  • 170
  • 301
yassin
  • 6,529
  • 7
  • 34
  • 39

9 Answers9

159

There are a few problems with this. If you use split and join, some white space characters will be ignored. The built-in capitalize and title methods do not ignore white space.

>>> 'There     is a way'.title()
'There     Is A Way'

If a sentence starts with an article, you do not want the first word of a title in lowercase.

Keeping these in mind:

import re 
def title_except(s, exceptions):
    word_list = re.split(' ', s)       # re.split behaves as expected
    final = [word_list[0].capitalize()]
    for word in word_list[1:]:
        final.append(word if word in exceptions else word.capitalize())
    return " ".join(final)

articles = ['a', 'an', 'of', 'the', 'is']
print title_except('there is a    way', articles)
# There is a    Way
print title_except('a whim   of an elephant', articles)
# A Whim   of an Elephant
dheerosaur
  • 14,736
  • 6
  • 30
  • 31
  • Why is `re` necessary? There's a `"".split` function that does the same. – wizzwizz4 Mar 19 '17 at 18:41
  • 1
    @wizzwizz4: `str.split` doesn't consider contiguous spaces. `re.split` retains spaces. So, this function doesn't eat up any spaces. – dheerosaur Mar 19 '17 at 20:01
  • @dheerosaur I thought that `"".split()` didn't consider them but `"".split(" ")` did. – wizzwizz4 Mar 19 '17 at 20:02
  • 1
    Your snippet won't work correctly for `title_except('a whim of aN elephant', articles)` case. You could use `word.lower() in exceptions` filtering condition to fix it. – Dariusz Walczak Sep 08 '17 at 07:27
  • @dheerosaur I am looking for a way to capitalize any word that follows not only an article but also a number. Could you make an addition to your answer that demonstrates this? E.g. `2001 a Space Odyssey` should return `2001 A Space Odyssey`, where the `a` is capitalized as it follows a number. Thanks in advance. – ProGrammer Jan 04 '18 at 22:58
  • @ProGrammer the `a` is not capitalized because it is after a letter, but because it is the first word of the subtitle. The way to recognize a subtitle is when it comes after a colon `:`, so the actual, proper title should be `2001: A Space Odyssey`. – Josh Coady Feb 22 '18 at 06:35
  • `is` is not an article, it is a verb and it should be capitalized. `of` is also not an article, it is a preposition, but it does not get capitalized. You generally do not capitalize articles, prepositions, and conjunctions if they are 3 letters or less and they are not the first word in the title or subtitle. More details on title casing: http://blog.apastyle.org/apastyle/2012/03/title-case-and-sentence-case-capitalization-in-apa-style.html – Josh Coady Feb 22 '18 at 06:38
  • @JoshuaCoady That's a valid point that I have also contemplated before. The problem, however, lies in the fact that macOS, my primary operating system, does not support `:` in file paths - I am looking to use it in a script that renames files. That's why I am leaving `:` out of the name (or replace it with `-` depending on the case). I think film titles are generally rather loose on (title/naming) conventions. I shall take a look at the article and see what conventions I may be able to carry over into the code. – ProGrammer Feb 22 '18 at 06:47
  • @ProGrammer A word with a trailing dash to indicate a subtitle would work. You'll need something to indicate it, otherwise you wont be able to tell the difference between the `a` in something like `The Investigator: A British Crime Story` and `In a World..` – Josh Coady Feb 23 '18 at 01:01
64

Use the titlecase.py module! Works only for English.

>>> from titlecase import titlecase
>>> titlecase('i am a foobar bazbar')
'I Am a Foobar Bazbar'

GitHub: https://github.com/ppannuto/python-titlecase

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
Etienne
  • 12,440
  • 5
  • 44
  • 50
  • 1
    The titlecase module doesn't work if the string you are converting contains a number anywhere in it. – Troy Jul 24 '13 at 23:57
  • 1
    @Troy it seems the number issue is fixed, or I did not hit your edge case. Ex: titlecase('one 4 two') -> 'One 4 Two'. Now titlecase('1one') -> '1one', but '1one'.title() -> '1One'. though this later case is an edge case and I'm not sure '1One' is the correct titling. I'm also not concerned enough to grab my grammar book. – brent.payne Sep 22 '14 at 04:53
  • Won't work in the case of "321 A BROADWAY STREET" where I get "321 a Broadway Street". Using the solution proposed by dheerosaur above produces "321 A Broadway Street". – MoreScratch Oct 28 '16 at 20:49
  • Also nice, it leaves acronyms in the title untouched. 'development of innovative TIaSR' becomes 'Development of Innovative TIaSR'. – Matthias Arras Jun 11 '20 at 09:16
24

There are these methods:

>>> mytext = u'i am a foobar bazbar'
>>> print mytext.capitalize()
I am a foobar bazbar
>>> print mytext.title()
I Am A Foobar Bazbar

There's no lowercase article option. You'd have to code that yourself, probably by using a list of articles you want to lower.

nosklo
  • 217,122
  • 57
  • 293
  • 297
15

Stuart Colville has made a Python port of a Perl script written by John Gruber to convert strings into title case but avoids capitalizing small words based on rules from the New York Times Manual of style, as well as catering for several special cases.

Some of the cleverness of these scripts:

  • they capitalizes small words like if, in, of, on, etc., but will un-capitalize them if they’re erroneously capitalized in the input.

  • the scripts assume that words with capitalized letters other than the first character are already correctly capitalized. This means they will leave a word like “iTunes” alone, rather than mangling it into “ITunes” or, worse, “Itunes”.

  • they skip over any words with line dots; “example.com” and “del.icio.us” will remain lowercase.

  • they have hard-coded hacks specifically to deal with odd cases, like “AT&T” and “Q&A”, both of which contain small words (at and a) which normally should be lowercase.

  • The first and last word of the title are always capitalized, so input such as “Nothing to be afraid of” will be turned into “Nothing to Be Afraid Of”.

  • A small word after a colon will be capitalized.

You can download it here.

BioGeek
  • 21,897
  • 23
  • 83
  • 145
4
capitalize (word)

This should do. I get it differently.

>>> mytext = u'i am a foobar bazbar'
>>> mytext.capitalize()
u'I am a foobar bazbar'
>>>

Ok as said in reply above, you have to make a custom capitalize:

mytext = u'i am a foobar bazbar'

def xcaptilize(word):
    skipList = ['a', 'an', 'the', 'am']
    if word not in skipList:
        return word.capitalize()
    return word

k = mytext.split(" ") 
l = map(xcaptilize, k)
print " ".join(l)   

This outputs

I am a Foobar Bazbar
pyfunc
  • 65,343
  • 15
  • 148
  • 136
  • That's not what I want. I want to get "I am a Foobar Bazbar" – yassin Sep 16 '10 at 16:53
  • @Yassin Ezbakhe : Edited my answer, this should work for you. The list of articles can be easily lifted from any dictionary – pyfunc Sep 16 '10 at 17:12
2

Python 2.7's title method has a flaw in it.

value.title()

will return Carpenter'S Assistant when value is Carpenter's Assistant

The best solution is probably the one from @BioGeek using titlecase from Stuart Colville. Which is the same solution proposed by @Etienne.

boatcoder
  • 17,525
  • 18
  • 114
  • 178
2
 not_these = ['a','the', 'of']
thestring = 'the secret of a disappointed programmer'
print ' '.join(word
               if word in not_these
               else word.title()
               for word in thestring.capitalize().split(' '))
"""Output:
The Secret of a Disappointed Programmer
"""

The title starts with capitalized word and that does not match the article.

Tony Veijalainen
  • 5,447
  • 23
  • 31
1

One-liner using list comprehension and the ternary operator

reslt = " ".join([word.title() if word not in "the a on in of an" else word for word in "Wow, a python one liner for titles".split(" ")])
print(reslt)

Breakdown:

for word in "Wow, a python one liner for titles".split(" ") Splits the string into an list and initiates a for loop (in the list comprehenstion)

word.title() if word not in "the a on in of an" else word uses native method title() to title case the string if it's not an article

" ".join joins the list elements with a seperator of (space)

0

One important case that is not being considered is acronyms (the python-titlecase solution can handle acronyms if you explicitly provide them as exceptions). I prefer instead to simply avoid down-casing. With this approach, acronyms that are already upper case remain in upper case. The following code is a modification of that originally provided by dheerosaur.

# This is an attempt to provide an alternative to ''.title() that works with 
# acronyms.
# There are several tricky cases to worry about in typical order of importance:
# 0. Upper case first letter of each word that is not an 'minor' word.
# 1. Always upper case first word.
# 2. Do not down case acronyms
# 3. Quotes
# 4. Hyphenated words: drive-in
# 5. Titles within titles: 2001 A Space Odyssey
# 6. Maintain leading spacing
# 7. Maintain given spacing: This is a test.  This is only a test.

# The following code addresses 0-3 & 7.  It was felt that addressing the others 
# would add considerable complexity.


def titlecase(
    s,
    exceptions = (
        'and', 'or', 'nor', 'but', 'a', 'an', 'and', 'the', 'as', 'at', 'by',
        'for', 'in', 'of', 'on', 'per', 'to'
    )
):
    words = s.strip().split(' ')
        # split on single space to maintain word spacing
        # remove leading and trailing spaces -- needed for first word casing

    def upper(s):
        if s:
            if s[0] in '‘“"‛‟' + "'":
                return s[0] + upper(s[1:])
            return s[0].upper() + s[1:]
        return ''

    # always capitalize the first word
    first = upper(words[0])

    return ' '.join([first] + [
        word if word.lower() in exceptions else upper(word)
        for word in words[1:]
    ])


cases = '''
    CDC warns about "aggressive" rats as coronavirus shuts down restaurants
    L.A. County opens churches, stores, pools, drive-in theaters
    UConn senior accused of killing two men was looking for young woman
    Giant asteroid that killed the dinosaurs slammed into Earth at ‘deadliest possible angle,’ study reveals
    Maintain given spacing: This is a test.  This is only a test.
'''.strip().splitlines()

for case in cases:
    print(titlecase(case))

When run, it produces the following:

CDC Warns About "Aggressive" Rats as Coronavirus Shuts Down Restaurants L.A. County Opens Churches, Stores, Pools, Drive-in Theaters
UConn Senior Accused of Killing Two Men Was Looking for Young Woman
Giant Asteroid That Killed the Dinosaurs Slammed Into Earth at ‘Deadliest Possible Angle,’ Study Reveals
Maintain Given Spacing: This Is a Test.  This Is Only a Test.
August West
  • 351
  • 3
  • 7