0

I'm having trouble finding a straight solution to following problem.

I have a column in a dataframe where I have str items like:

'RosemontCentral'

'Dollard-des-OrmeauxEast'

I want to separate a string at the point where a capital letter starts but not if its preceded by a hyphen.

For example:

'RosemontCentral' to 'Rosemont Central'

'Dollard-des-OrmeauxEast' to 'Dollard-des-Ormeaux East'

I have the bellow regex function so far. It does a fairly good job with items such as the first one, where there is no hyphenated words. But, not with the ones that have hyphens. Additionally, the below regex function adds an undesirable leading space at the very beginning of the string. Like the one below.

' Dollard-des- Ormeaux East'

def add_space(Neighborhood):
        return re.sub( r"([A-Z])", r" \1", Neighborhood)

df['Neighborhood'] =  df['Neighborhood'].apply(add_space)

df

Thank you for your time

Kuldeep Singh Sidhu
  • 3,748
  • 2
  • 12
  • 22
Efren M
  • 67
  • 5
  • 1
    Possible duplicate of [how-to-do-camelcase-split-in-python](https://stackoverflow.com/questions/29916065/how-to-do-camelcase-split-in-python) –  Jun 10 '20 at 03:28

2 Answers2

2

You may try adding a lookbehind to your regex pattern which asserts that a non-dash character precedes the capital letter:

def add_space(Neighborhood):
    return re.sub(r'(?<=[^-])([A-Z])', r' \1', Neighborhood)

df['Neighborhood'] =  df['Neighborhood'].apply(add_space)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

This will do:

def add_space(Neighborhood):
        return re.sub("([a-z])([A-Z])","\g<1> \g<2>",Neighborhood)

add_space('Dollard-des-OrmeauxEast')
# 'Dollard-des-Ormeaux East'

add_space('RosemontCentral')
# 'Rosemont Central'
Pygirl
  • 12,969
  • 5
  • 30
  • 43