regular expression for title case - Python

Question

I need to find a combination of 2 consecutive title case words.

This is my code so far,

text='Hi my name is Moh Shai and This Is a Python Code with Regex and Needs Some Expertise'

rex=r'[A-Z][a-z]+\s+[A-Z][a-z]+'

re.findall(rex,text)

This gives me,

['Moh Shai', 'This Is', 'Python Code', 'Needs Some']

However, I need all the combinations. Something like,

['Moh Shai', 'This Is', 'Python Code', 'Needs Some','Some Expertise']

Can someone please help?

Does [this](http://stackoverflow.com/questions/5616822/python-regex-find-all-overlapping-matches) help? — TigerhawkT3, Apr 19 '16 at 23:34
If you can install a third-party module, the easiest way is with the [regex module](https://pypi.python.org/pypi/regex), which supports an `overlapped=True` flag on `findall()`. — kindall, Apr 19 '16 at 23:39
@kindall you are awesome. That works great! Can you please post an answer so I may accept? — Md. Mohsin, Apr 19 '16 at 23:41
Please see: http://stackoverflow.com/questions/5616822/python-regex-find-all-overlapping-matches — user3516726, Apr 19 '16 at 23:49

score 4 · Accepted Answer · answered Apr 19 '16 at 23:38

You can use a regex lookahead in combination with the re.finditer function in order to get the desired outcome:

import re

text='Hi my name is Moh Shai and This Is a Python Code with Regex and Needs Some Expertise'
rex=r'(?=([A-Z][a-z]+\s+[A-Z][a-z]+))'

matches = re.finditer(rex,text)
results = [match.group(1) for match in matches]

Now results will contain the information you need:

>>> results
['Moh Shai', 'This Is', 'Python Code', 'Needs Some', 'Some Expertise']

edit: For what it's worth, you don't even really need the finditer function. You can replace those bottom two lines with your previous line re.findall(rex,text) for the same effect.

This answer identifies only Title Case of 2 words, it would fail on "The United States Of America" — Uri Goren, Aug 12 '17 at 19:42

Uri Goren · Answer 2 · 2017-08-13T09:03:46.680

I came to this question by It's title and was disappointed that the solution wasn't what I expected.

The accepted answer only works for titles of exactly 2 words

This code would return all of the tokens that are in title case, without assuming anything on the amount of words in the title

import re, collections
def title_case_to_token(c):
    totoken = lambda s: s[0] + "<" + s[1:-2].replace(" ","_") + ">" + s[-2:]
    tokenized = re.sub("([\s\.\,;]([A-Z][a-z]+[\s\.\,;])+[^A-Z])", lambda m: totoken(m.group(0))," " + c + " x")[1:-2]
    tokens = collections.Counter(re.compile("<\w+>").findall(tokenized))
    return (tokens, tokenized)

For example

text='Hi my name is Moh Shai and This Is a Python Code with Regex and Needs Some Expertise'
tokens, tokenized = title_case_to_token(text)

Value of tokens

Counter({'<Hi>': 1, '<Moh_Shai>': 1, '<This_Is>': 1, '<Python_Code>': 1, '<Regex>': 1, '<Needs_Some_Expertise>': 1})

Note that `Needs_Some_Expertise` is also caught by this regex, and it has 3 words

Value of tokenized

<Hi> my name is <Moh_Shai> and <This_Is> a <Python_Code> with <Regex> and <Needs_Some_Expertise>

score 1 · Answer 3 · answered Apr 20 '16 at 00:46

1

If you can install a third-party module, the easiest way is with the regex module, which supports an overlapped=True flag on findall().

answered Apr 20 '16 at 00:46

kindall

178,883
35
278
309

regular expression for title case - Python

3 Answers3

Note that `Needs_Some_Expertise` is also caught by this regex, and it has 3 words

Linked

regular expression for title case - Python

3 Answers3

Note that Needs_Some_Expertise is also caught by this regex, and it has 3 words

Linked

Note that `Needs_Some_Expertise` is also caught by this regex, and it has 3 words