33

I've got a file whose format I'm altering via a python script. I have several camel cased strings in this file where I just want to insert a single space before the capital letter - so "WordWordWord" becomes "Word Word Word".

My limited regex experience just stalled out on me - can someone think of a decent regex to do this, or (better yet) is there a more pythonic way to do this that I'm missing?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Electrons_Ahoy
  • 36,743
  • 36
  • 104
  • 127

10 Answers10

60

You could try:

>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWord")
'Word Word Word'
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
38

If there are consecutive capitals, then Gregs result could not be what you look for, since the \w consumes the caracter in front of the captial letter to be replaced.

>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWWWWWWWord")
'Word Word WW WW WW Word'

A look-behind would solve this:

>>> re.sub(r"(?<=\w)([A-Z])", r" \1", "WordWordWWWWWWWord")
'Word Word W W W W W W Word'
14

Have a look at my answer on .NET - How can you split a “caps” delimited string into an array?

Edit: Maybe better to include it here.

re.sub(r'([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))', r'\1 ', text)

For example:

"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]
Community
  • 1
  • 1
Markus Jarderot
  • 86,735
  • 21
  • 136
  • 138
14

Perhaps shorter:

>>> re.sub(r"\B([A-Z])", r" \1", "DoIThinkThisIsABetterAnswer?")
tzot
  • 92,761
  • 29
  • 141
  • 204
14

Maybe you would be interested in one-liner implementation without using regexp:

''.join(' ' + char if char.isupper() else char.strip() for char in text).strip()
Yaroslav Surzhikov
  • 1,568
  • 1
  • 11
  • 16
5

With regexes you can do this:

re.sub('([A-Z])', r' \1', str)

Of course, that will only work for ASCII characters, if you want to do Unicode it's a whole new can of worms :-)

Dan Lenski
  • 76,929
  • 13
  • 76
  • 124
3

If you have acronyms, you probably do not want spaces between them. This two-stage regex will keep acronyms intact (and also treat punctuation and other non-uppercase letters as something to add a space on):

re_outer = re.compile(r'([^A-Z ])([A-Z])')
re_inner = re.compile(r'(?<!^)([A-Z])([^A-Z])')
re_outer.sub(r'\1 \2', re_inner.sub(r' \1\2', 'DaveIsAFKRightNow!Cool'))

The output will be: 'Dave Is AFK Right Now! Cool'

David Underhill
  • 15,896
  • 7
  • 53
  • 61
1

I agree that the regex solution is the easiest, but I wouldn't say it's the most pythonic.

How about:

text = 'WordWordWord'
new_text = ''

for i, letter in enumerate(text):
    if i and letter.isupper():
        new_text += ' '

    new_text += letter
Yaroslav Surzhikov
  • 1,568
  • 1
  • 11
  • 16
monkut
  • 42,176
  • 24
  • 124
  • 155
  • This has the same problem as Dan's - you'll get extra spaces before capitals even if they aren't needed. – Brian Oct 14 '08 at 08:34
  • True, i've edited it to add a flag... I admit it's a little cumbersome, but may be easier to remember than regex. – monkut Oct 15 '08 at 01:01
0

I think regexes are the way to go here, but just to give a pure python version without (hopefully) any of the problems ΤΖΩΤΖΙΟΥ has pointed out:

def splitCaps(s):
    result = []
    for ch, next in window(s+" ", 2):
        result.append(ch)
        if next.isupper() and not ch.isspace():
            result.append(' ')
    return ''.join(result)

window() is a utility function I use to operate on a sliding window of items, defined as:

import collections, itertools

def window(it, winsize, step=1):
    it=iter(it)  # Ensure we have an iterator
    l=collections.deque(itertools.islice(it, winsize))
    while 1:  # Continue till StopIteration gets raised.
        yield tuple(l)
        for i in range(step):
            l.append(it.next())
            l.popleft()
Brian
  • 116,865
  • 28
  • 107
  • 112
0

To the old thread - wanted to try an option for one of my requirements. Of course the re.sub() is the cool solution, but also got a 1 liner if re module isn't (or shouldn't be) imported.

st = 'ThisIsTextStringToSplitWithSpace'
print(''.join([' '+ s if s.isupper()  else s for s in st]).lstrip())
DaveL17
  • 1,673
  • 7
  • 24
  • 38
Srini
  • 1
  • 1