0

I want to add space between Persian number and Persian letter like this:

"سعید123" convert to "سعید 123"

Java code of this procedure is like below.

str.replaceAll("(?<=\\p{IsDigit})(?=\\p{IsAlphabetic})", " ").

But I can't find any python solution.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Saeed Bibak
  • 46
  • 10
  • 1
    @WiktorStribiżew I think python is more wiser than that :) – Saeed Bibak May 07 '18 at 11:37
  • Well, if you want to insert the space between any Unicode letter and digit, use `re.sub(r'(?u)([^\W\d_])(\d)', r'\1 \2', s)`. Note that `(?u)` is only needed in Python 2.x. In Python 3.x, the patterns are Unicode-aware by default and you may remove it, `r'([^\W\d_])(\d)'`. – Wiktor Stribiżew May 07 '18 at 11:42
  • @SaeedBibak are you interested in a job in tehran? – Iman Nia Sep 30 '20 at 18:59

3 Answers3

1

I am not sure if this is a correct approach.

import re
k = "سعید123"
m = re.search("(\d+)", k)
if m:
    k = " ".join([m.group(), k.replace(m.group(), "")])
    print(k)

Output:

123 سعید
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • It is work in only one number phrase. But what I want is something that works in the case like: "4سعید5سعید" – Saeed Bibak May 07 '18 at 11:36
1

You may use

re.sub(r'([^\W\d_])(\d)', r'\1 \2', s, flags=re.U)

Note that in Python 3.x, re.U flag is redundant as the patterns are Unicode aware by default.

See the online Python demo and a regex demo.

Pattern details

  • ([^\W\d_]) - Capturing group 1: any Unicode letter (literally, any char other than a non-word, digit or underscore chars)
  • (\d) - Capturing group 2: any Unicode digit

The replacement pattern is a combination of the Group 1 and 2 placeholders (referring to corresponding captured values) with a space in between them.

You may use a variation of the regex with a lookahead:

re.sub(r'[^\W\d_](?=\d)', r'\g<0> ', s)

See this regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

There is a short regex which you may rely on to match boundary between letters and digits (in any language):

\d(?=[^_\d\W])|[^_\d\W](?=\d)

Live demo

Breakdown:

  • \d Match a digit
  • (?=[^_\d\W]) Preceding a letter from a language
  • | Or
  • [^_\d\W] Match a letter from a language
  • (?=\d) Preceding a digit

Python:

re.sub(r'\d(?![_\d\W])|[^_\d\W](?!\D)', r'\g<0> ', str, flags = re.UNICODE)

But according to this answer, this is the right way to accomplish this task:

re.sub(r'\d(?=[آابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی])|[آابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی](?=\d)', r'\g<0> ', str,  flags = re.UNICODE)
revo
  • 47,783
  • 14
  • 74
  • 117