2

I want to match the word 'St' or 'St.' or 'st' or 'st.' BUT only as the first word of a string. For example 'St. Mary Church Church St.' - should find ONLY first 'St.'.

  • 'st. Mary Church Church St.' - should find ONLY 'st.'
  • 'st Mary Church Church St.' - should find ONLY 'st'

I want to eventually replace the first occurrence with 'Saint'.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
jon jon
  • 55
  • 1
  • 3
  • 1
    Why do you need a regex? Just split the string up into words by whitespace and get the first one. – Blender Aug 28 '16 at 16:01
  • 1
    Does the code only have to handle strings that start with a variation of "St."? Or are there other strings that start with something else? – Dartmouth Aug 28 '16 at 16:19

6 Answers6

3

Regex sub allows you to define the number of occurrences to replace in a string:

import re

s = "St. Mary Church Church St."
new_s = re.sub(r'^(St.|st.|St|st)\s', r'Saint ', s, 1) # the last argument defines the number of occurrences to be replaced. In this case, it will replace the first occurrence only.
print(new_s)
#  'Saint Mary Church Church St.'
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
JazZ
  • 4,469
  • 2
  • 20
  • 40
  • This nearly works, but it needs one small fix; if you substitute strings starting with "St" or "st", there is no space after "Saint", so `re.sub(…)`ing `s = "St Mary Church Church St."` gives 'SaintMary Church Church St.' – Dartmouth Aug 28 '16 at 18:02
  • Thanks pointed it out. But like you've seen in the example, the output was good. Anyway, I edited my answer to take care the "st" expression followed by a space and added a space after "Saint". Thank you. ; ) – JazZ Aug 28 '16 at 18:46
2

You don't need to use a regex for this, just use the split() method on your string to split it by whitespace. This will return a list of every word in your string:

matches = ["St", "St.", "st", "st."]
name = "St. Mary Church Church St."

words = name.split()  # split the string into words into a list
if words [0] in matches:
    words[0] = "Saint"  # replace the first word in the list (St.) with Saint
new_name = " ".join(words)  # create the new name from the words, separated by spaces
print(new_name)  # Output: "Saint Mary Church Church St."
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Dartmouth
  • 1,069
  • 2
  • 15
  • 22
  • That's nice, but it doesn't check if the first word is St, or .. etc. – joel goldstick Aug 28 '16 at 16:15
  • OP isn't supplying enough information whether there are strings that don't start with a variation of "St."... I'll update my answer though. – Dartmouth Aug 28 '16 at 16:17
  • the [split](https://docs.python.org/2/library/stdtypes.html?highlight=str.split#str.split) method accepts a maxsplit argument. it could be nice to provide it to avoid processing all the string after it found the first split. – Tryph Aug 29 '16 at 13:14
  • @Tryph, that's quite useful, although it doesn't really simplify the current code. Besides, there is basically no processing of the rest of the string, only when `join`ing the words again. – Dartmouth Aug 29 '16 at 14:01
  • It is possible to remove the need of `matches` by doing `if words[0].lower().strip('.') == 'st':` – Tomerikoo Jan 30 '23 at 10:37
1

You can simply pass the flag parameter into the sub function. This will allow you to reduce the amount of information you need to pass to the pattern parameter in the tool. This makes the code a little cleaner and reduces the chances of you missing a pattern:

import re

s = "St. Mary Church Church St."
new_s = re.sub(r'^(st.|st)\s', r'Saint ', s, 1, flags=re.IGNORECASE) # You can shorten the code from above slightly by ignoring the case
print(new_s)
#  'Saint Mary Church Church St.'
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Robert Hadsell
  • 616
  • 6
  • 4
0

Try using the regex '^\S+' to match the first non-space character in your string.

import re 

s = 'st Mary Church Church St.'
m = re.match(r'^\S+', s)
m.group()    # 'st'

s = 'st. Mary Church Church St.'
m = re.match(r'^\S+', s)
m.group()    # 'st.'
pylang
  • 40,867
  • 14
  • 129
  • 121
orz
  • 11
  • 1
  • @orz, it may be your first time, so your answer has been edited to show what might be expected next time, format-wise. Remember to format code with code blocks, use reproducible examples that run in console and briefly explain what is happening. And welcome to SO. – pylang Aug 28 '16 at 20:49
0

Python 3.10 introduced a new Structural Pattern Matching feature (otherwise known as match/case) which can fit this use-case:

s = "St. Mary Church Church St."

words = s.split()
match words:
    case ["St" | "St." | "st" | "st.", *rest]:
        print("Found st at the start")
        words[0] = "Saint"
    case _:
        print("didn't find st at the start")

print(' '.join(words))

Will give:

Found st at the start
Saint Mary Church Church St.

While using s = "Mary Church Church St." will give:

didn't find st at the start
Mary Church Church St.
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
-2
import re

string = "Some text"

replace = {'St': 'Saint', 'St.': 'Saint', 'st': 'Saint', 'st.': 'Saint'}
replace = dict((re.escape(k), v) for k, v in replace.iteritems())
pattern = re.compile("|".join(replace.keys()))
for text in string.split():
    text = pattern.sub(lambda m: replace[re.escape(m.group(0))], text)

This should work I guess, please check. Source

Community
  • 1
  • 1
Jeril
  • 7,858
  • 3
  • 52
  • 69