I want to separate a string at the point where a capital letter starts but not if its preceded by a hyphen using regex in python

Question

I'm having trouble finding a straight solution to following problem.

I have a column in a dataframe where I have str items like:

'RosemontCentral'

'Dollard-des-OrmeauxEast'

I want to separate a string at the point where a capital letter starts but not if its preceded by a hyphen.

For example:

'RosemontCentral' to 'Rosemont Central'

'Dollard-des-OrmeauxEast' to 'Dollard-des-Ormeaux East'

I have the bellow regex function so far. It does a fairly good job with items such as the first one, where there is no hyphenated words. But, not with the ones that have hyphens. Additionally, the below regex function adds an undesirable leading space at the very beginning of the string. Like the one below.

' Dollard-des- Ormeaux East'

def add_space(Neighborhood):
        return re.sub( r"([A-Z])", r" \1", Neighborhood)

df['Neighborhood'] =  df['Neighborhood'].apply(add_space)

df

Thank you for your time

Possible duplicate of [how-to-do-camelcase-split-in-python](https://stackoverflow.com/questions/29916065/how-to-do-camelcase-split-in-python) — , Jun 10 '20 at 03:28

score 2 · Accepted Answer · answered Jun 10 '20 at 03:36

2

You may try adding a lookbehind to your regex pattern which asserts that a non-dash character precedes the capital letter:

def add_space(Neighborhood):
    return re.sub(r'(?<=[^-])([A-Z])', r' \1', Neighborhood)

df['Neighborhood'] =  df['Neighborhood'].apply(add_space)

answered Jun 10 '20 at 03:36

Tim Biegeleisen

502,043
27
286
360

1

Excellent, Thank you very much. – Efren M Jun 10 '20 at 04:46

score 2 · Answer 2 · answered Jun 10 '20 at 03:36

2

This will do:

def add_space(Neighborhood):
        return re.sub("([a-z])([A-Z])","\g<1> \g<2>",Neighborhood)

add_space('Dollard-des-OrmeauxEast')
# 'Dollard-des-Ormeaux East'

add_space('RosemontCentral')
# 'Rosemont Central'

answered Jun 10 '20 at 03:36

Pygirl

12,969
5
30
43

I want to separate a string at the point where a capital letter starts but not if its preceded by a hyphen using regex in python

2 Answers2