-5

I have to search a string for words that have a number as prefix or suffix (Example, "abc21" or "943xyz". Then, I need to split the number from the word.

For example, "abc12" has to converted to "abc 12" or "12abc" has to be converted to "12 abc"

However, if the number lies in between letters, for example, "a12bc", then it should be left as it is. How can we do this? Is there a simpler way than regex?

  • 4
    Please show what you have tried already to solve this problem. – roganjosh Jan 25 '18 at 21:17
  • this is pretty close : https://stackoverflow.com/questions/430079/how-to-split-strings-into-text-and-number – jmunsch Jan 25 '18 at 21:18
  • @jmunsch pretty close + too broad = closing to me :) thanks for the link – Jean-François Fabre Jan 25 '18 at 21:19
  • That's a pretty far-fetched duplicate, so I'm gonna leave a hint for the OP: The regex in that question `\D+\d+` matches only words with digits at the end. Duplicate that and turn it around, you get `\D+\d+|\d+\D+` which matches words with digits on either end. From there you just need to figure out how to insert a space. (Hint #2: `re.sub`) – Aran-Fey Jan 25 '18 at 21:23
  • @Rawing actually, I think `\w` matches alphanumeric, so that might not play well, perhaps best to use `\d\D` – juanpa.arrivillaga Jan 25 '18 at 21:28
  • @juanpa.arrivillaga Oops, nice catch. Fixed, thanks. – Aran-Fey Jan 25 '18 at 21:28
  • @Rawing a somewhat hacky approach: `re.sub(r'((\\D+)(\\d+))|((\\d+)(\\D+))', r"\2 \3\5 \6", '943xyz').strip()` I'm not sure if I grasp grouping correctly. – juanpa.arrivillaga Jan 25 '18 at 21:30
  • @juanpa.arrivillaga You don't need that many groups. `re.sub(r'(\D+)(\d+)|(\d+)(\D+)', r"\1\3 \2\4", '943xyz')` works too :) – Aran-Fey Jan 25 '18 at 21:33
  • @Rawing yep, was definitely over-doing it. Wasn't sure about the precedence of alternation in regex... – juanpa.arrivillaga Jan 25 '18 at 21:34
  • hey also welcome to stackoverflow check these out when you get time : https://stackoverflow.com/tour AND https://stackoverflow.com/help/how-to-ask AND https://meta.stackexchange.com/questions/21788/how-does-editing-work – jmunsch Jan 25 '18 at 21:35

3 Answers3

0

Something simple like one of these.
All that's needed is to protect the boundary's with these (?<! [\da-z] ) .. (?! [\da-z] )
which does 2 things:
- it stops the engine from matching between like kinds (digits or alphas).
- insures no bookend types.

Way 1:

Find (?<![\da-z])(?:([a-z]+)(\d+)|(\d+)([a-z]+))(?![\da-z])
Replace $1$3 $2$4

https://regex101.com/r/k4gNoE/1

 (?<! [\da-z] )
 (?:
      ( [a-z]+ )             # (1)
      ( \d+ )                # (2)
   |  
      ( \d+ )                # (3)
      ( [a-z]+ )             # (4)
 )
 (?! [\da-z] )

Way 2:

Find (?<![\da-z])(?:([a-z]+(?=\d)|\d+(?=[a-z]))((?<=\d)[a-z]+|(?<=[a-z])\d+))(?![\da-z]) Replace $1 $2

https://regex101.com/r/LbWnkg/1

 (?<! [\da-z] )
 (?:
      (                        # (1 start)
           [a-z]+ 
           (?= \d )
        |  \d+ 
           (?= [a-z] )
      )                        # (1 end)
      (                        # (2 start)
           (?<= \d )
           [a-z]+ 
        |  (?<= [a-z] )
           \d+ 
      )                        # (2 end)
 )
 (?! [\da-z] )
0

You can try this:

def split_vals(s):
  return ' '.join(re.findall('^\d+|\d+$|^[a-zA-Z]\d+[a-zA-Z]+$|^[a-zA-Z]+$|[a-zA-Z]+', s))
s = ["abc21", "943xyz", '12abc', "a12bc"]
new_s = list(map(split_vals, s))

Output:

['abc 21', '943 xyz', '12 abc', 'a12bc']
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
0

You can use re.sub to insert that space:

re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", word)

This matches digits followed by non-digits or vice-versa.

The \b boundaries make sure the word is matched in its entirety, so that we don't match numbers in the middle of a word.

The replacement pattern \1\3 \2\4 takes advantage of the fact that unmatched groups are replaced with the empty string. We know that either group 1 and 2 or group 3 and 4 will match, and the other groups will be empty, so \1\3 \2\4 will always produce a valid result (without duplicating any part of the input).


Examples:

>>> re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", "abc12")
'abc 12'
>>> re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", "12abc")
'12 abc'
>>> re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", "a12bc")
'a12bc'
Aran-Fey
  • 39,665
  • 11
  • 104
  • 149