Add space between Persian numeric and letter with python re

Question

I want to add space between Persian number and Persian letter like this:

"سعید123" convert to "سعید 123"

Java code of this procedure is like below.

str.replaceAll("(?<=\\p{IsDigit})(?=\\p{IsAlphabetic})", " ").

But I can't find any python solution.

Well, if you want to insert the space between any Unicode letter and digit, use `re.sub(r'(?u)([^\W\d_])(\d)', r'\1 \2', s)`. Note that `(?u)` is only needed in Python 2.x. In Python 3.x, the patterns are Unicode-aware by default and you may remove it, `r'([^\W\d_])(\d)'`. — Wiktor Stribiżew, May 07 '18 at 11:42

score 1 · Answer 1 · answered May 07 '18 at 11:27

1

I am not sure if this is a correct approach.

import re
k = "سعید123"
m = re.search("(\d+)", k)
if m:
    k = " ".join([m.group(), k.replace(m.group(), "")])
    print(k)

Output:

123 سعید

answered May 07 '18 at 11:27

Rakesh

It is work in only one number phrase. But what I want is something that works in the case like: "4سعید5سعید" – Saeed Bibak May 07 '18 at 11:36

score 1 · Answer 2 · answered May 07 '18 at 11:50

You may use

re.sub(r'([^\W\d_])(\d)', r'\1 \2', s, flags=re.U)

Note that in Python 3.x, re.U flag is redundant as the patterns are Unicode aware by default.

Pattern details

([^\W\d_]) - Capturing group 1: any Unicode letter (literally, any char other than a non-word, digit or underscore chars)
(\d) - Capturing group 2: any Unicode digit

The replacement pattern is a combination of the Group 1 and 2 placeholders (referring to corresponding captured values) with a space in between them.

You may use a variation of the regex with a lookahead:

re.sub(r'[^\W\d_](?=\d)', r'\g<0> ', s)

revo · Accepted Answer · 2018-05-07T12:42:30.497

There is a short regex which you may rely on to match boundary between letters and digits (in any language):

\d(?=[^_\d\W])|[^_\d\W](?=\d)

Breakdown:

Python:

re.sub(r'\d(?![_\d\W])|[^_\d\W](?!\D)', r'\g<0> ', str, flags = re.UNICODE)

But according to this answer, this is the right way to accomplish this task:

re.sub(r'\d(?=[آابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی])|[آابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی](?=\d)', r'\g<0> ', str,  flags = re.UNICODE)

3 Answers3