0

I am working with tweets and I would like to have all the variations of aa aaaa aaah ahhh replaced by a single expression 'ah'. However, using my code I also replace the single 'a' and the 'and' which I don't want to change.

a = 'trying a aa aaaaaa aaaah and aaaahhh aaaaaaaahhh '
re.sub('a+h*','ah',a)

This way i get:

Current output: 'trying ah ah ah ah ahnd ah ah '

But what I want is:

Desired output: 'trying a ah ah ah and ah ah '
jroc
  • 91
  • 6

1 Answers1

2

In your current expression a+ matches one a or more. You want the match to start with at least two a's.

s = 'a ah aah aa
re.sub('aa+h*','ah',s) # 'a ah ah ah'

This can be generalized with the quantifier {x,[y]} which matches x occurrences or more, optionally up to y.

re.sub('a{2,}h*','ah',s)
Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73