0

I am trying to split some text into sentences, capitalize the first character of each sentence, and recombine the results into one string. However the capitalize() is only happening on the first sentence. Why is that?

import re

slow = "fat chance. not going to happen! whatever next? give us  break."
mylist = re.split('([.?!])', slow)
print(mylist)       # check progress so far

out = []
for w in mylist:
    if w not in ".?!":
        w = w.capitalize()          # Why does this only work the first time?
    out.append(w)

print("".join(out))

# Output:
# ['fat chance', '.', ' not going to happen', '!', ' whatever next', '?', ' give us a break', '.', '']
# Fat chance. not going to happen! whatever next? give us a break.
martineau
  • 119,623
  • 25
  • 170
  • 301
acclivity
  • 1
  • 1
  • 4
  • Just to note: your `if w not in ".?!"` is redundant here...it doesn't matter if you capitalise them, it'll be a no-op for those characters anyway... – Jon Clements Jun 19 '21 at 14:18
  • Good thinking. Thanks. – acclivity Jun 19 '21 at 14:21
  • You'll probably want to split the string into sentences first, maybe this will help? https://stackoverflow.com/questions/4576077/how-can-i-split-a-text-into-sentences – thebjorn Jun 19 '21 at 14:30
  • Would something like: `re.sub('(\w)(.*?)([.?!])', lambda m: m.group(1).upper() + m.expand(r'\2\3'), slow)` work for your case... ? – Jon Clements Jun 19 '21 at 14:44

5 Answers5

2

You have a whitespace before every new string in mylist. To fix that you can use strip()

mylist = [w.strip() for w in mylist]
print(mylist)
# ['fat chance', '.', 'not going to happen', '!', 'whatever next', '?', 'give us a break', '.', '']
Saatvik Ramani
  • 392
  • 3
  • 8
0

Thanks all. This worked (I had to strip() the final output too).

out = []

for w in mylist:
    
    if w not in ".?!":
        
        w = " " + w.lstrip().capitalize()
        
    out.append(w)
    
print("".join(out).strip())
acclivity
  • 1
  • 1
  • 4
0

An alternative is using re.sub looking for a letter character as one group, anything up until one of ?!. or the end of string as another group, and finally the separator as the third group, then you take the first group and upper case, then add the 2nd and 3rd group to the end, eg:

import re

slow = "fat chance. not going to happen! whatever next? give us  break."
re.sub(r'(\w)(.*?)([.?!]|$)', lambda m: m.group(1).upper() + m.expand(r'\2\3'), slow)
# 'Fat chance. Not going to happen! Whatever next? Give us  break.'

This'll also preserve white spacing so you don't end up using it because of using strip to get capitalize to function as expected, or worry about what separator is most appropriate when using join eg:

slow = "        fat chance.    not going to happen    !  whatever next? give us  break"
# '        Fat chance.    Not going to happen    !  Whatever next? Give us  break'
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
0

Could I alternatively, split the input on "? " or ". " or "! " , or on "," "?" and "!" if the first 3 don't exist? What would my regex look like to schieve that?

acclivity
  • 1
  • 1
  • 4
-1

You need to use lstrip() to remove leading characters after applying the regular expression, otherwise capitalize() will be ineffective.

result = ''.join([sentence.lstrip().capitalize() for sentence in re.split('([.?!])', slow)])
print(result)

# Fat chance.Not going to happen!Whatever next?Give us  break.

If you want the first letter of each word to be a capital letter then use title():

>>> slow = "fat chance. not going to happen! whatever next? give us  break."
>>> slow.title()
'Fat Chance. Not Going To Happen! Whatever Next? Give Us  Break.'

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
  • 1
    I don't think the OP wants *every* word captalised... just what they've deemed as a sentence with the first letter as a capital... eg... "Fat chance" - not "Fat Chance"... – Jon Clements Jun 19 '21 at 14:19
  • Jon is correct. Only the first letter of a SENTENCE is to be capitalized. – acclivity Jun 19 '21 at 14:23
  • @GiorgosMyrianthous you've now got a typo in your code and the output doesn't preserve the spacing between "sentences"... – Jon Clements Jun 19 '21 at 14:34