3

I'm trying to find all lines that are all caps using regex, and so far I've tried this:

re.findall(r'\b\n|[A-Z]+\b', kaizoku)

So far my database is as follows:

TRAFALGAR LAW
You shall not be the pirate king.
MONKEY D LUFFY
Now!
DOFLAMINGO'S UNDERLINGS:
Noooooo!

I want it to return

TRAFALGAR LAW
MONKEY D LUFFY
DOFLAMINGO'S UNDERLINGS:

But it's returning something else. (Namely this:

TRAFALGAR
LAW
Y
MONKEY
D
LUFFY
N
DOFLAMINGO'
S
UNDERLINGS:
N

EDIT So far I really think the best fit for the answer is @Jan's answer

rx = re.compile(r"^([A-Z ':]+$)\b", re.M)
rx.findall(string)

EDIT2 Found out what's wrong, thanks!

Sunny League
  • 139
  • 1
  • 8

3 Answers3

5

Brief

No need for regex, python has the method isupper()

Return true if all cased characters[4] in the string are uppercase and there is at least one cased character, false otherwise.

[4] Cased characters are those with general category property being one of “Lu” (Letter, uppercase), “Ll” (Letter, lowercase), or “Lt” (Letter, titlecase).


Code

See code in use here

a = [
    "TRAFALGAR LAW",
    "You shall not be the pirate king.",
    "MONKEY D LUFFY",
    "Now!",
    "DOFLAMINGO'S UNDERLINGS:",
    "Noooooo!",
]

for s in a:
    print s.isupper()

Result

True
False
True
False
True
False
ctwheels
  • 21,901
  • 9
  • 42
  • 77
4

Here you go

import re

string = """TRAFALGAR LAW
You shall not be the pirate king.
MONKEY D LUFFY
Now!
DOFLAMINGO'S UNDERLINGS:
Noooooo!
"""

rx = re.compile(r"^([A-Z ':]+$)", re.M)

UPPERCASE = [line for line in string.split("\n") if rx.match(line)]
print(UPPERCASE)

Or:

rx = re.compile(r"^([A-Z ':]+$)", re.M)

UPPERCASE = rx.findall(string)
print(UPPERCASE)

Both will yield

['TRAFALGAR LAW', 'MONKEY D LUFFY', "DOFLAMINGO'S UNDERLINGS:"]
Jan
  • 42,290
  • 8
  • 54
  • 79
2

You can use [A-Z\W] to check for any uppercase letters along with non alphanumeric characters:

import re
s = ["TRAFALGAR LAW", "You shall not be the pirate king.", "MONKEY D LUFFY", "Now!", "DOFLAMINGO'S UNDERLINGS:", "Noooooo!"]
new_s = [i for i in s if re.findall('^[A-Z\d_\W]+$', i)]

Output:

['TRAFALGAR LAW', 'MONKEY D LUFFY', "DOFLAMINGO'S UNDERLINGS:"]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • Wouldn't `[A-Z\d_\W]` be better as it includes digits and underscore (in the case that they may be used)? – ctwheels Dec 06 '17 at 21:57