8

I am trying to separate non-numbers from numbers in a Python string. Numbers can include floats.

Examples

Original String               Desired String
'4x5x6'                       '4 x 5 x 6'
'7.2volt'                     '7.2 volt'
'60BTU'                       '60 BTU'
'20v'                         '20 v'
'4*5'                         '4 * 5'
'24in'                        '24 in'

Here is a very good thread on how to achieve just that in PHP:

Regex: Add space if letter is adjacent to a number

I would like to manipulate the strings above in Python.

Following piece of code works in the first example, but not in the others:

new_element = []
result = [re.split(r'(\d+)', s) for s in (unit)]
for elements in result:
   for element in elements:
       if element != '':
           new_element.append(element)

    new_element = ' '.join(new_element)
    break
Community
  • 1
  • 1
Alejandro Simkievich
  • 3,512
  • 4
  • 33
  • 49
  • Post your try please – Iron Fist Apr 17 '16 at 19:55
  • So did you try anything from that thread? I am sure `r'(?<=[a-zA-Z])(?=\d)|(?<=\d)(?=[a-zA-Z])'` is very close. – Wiktor Stribiżew Apr 17 '16 at 19:55
  • hi, unfortunately re.sub('/(?<=[a-z])(?=\d)|(?<=\d)(?=[a-z])/i', ' ','7.2volt') does not return '7.2 volt'. nor re.sub('/(?<=[a-z])(?=\d)|(?<=\d)(?=[a-z])/i', ' ','4x5x6') returns '4 x 5 x 6' – Alejandro Simkievich Apr 17 '16 at 19:58
  • You cannot pass a regex object to a `re` method, only pass the pattern itself (see the suggestion above - which is not working for you, but close). – Wiktor Stribiżew Apr 17 '16 at 20:00
  • hi there. thanks a lot, re.sub(r'(?<=[a-zA-Z])(?=\d)|(?<=\d)(?=[a-zA-Z])', ' ',my_string) works pretty well. It does not work for non-letters like '*' but it is close enough. thanks again. – Alejandro Simkievich Apr 17 '16 at 20:05
  • I guess you just need to match numbers and enclose them with spaces. See [this thread](http://stackoverflow.com/questions/385558/extract-float-double-value) and [this demo](http://ideone.com/sB1pYF). – Wiktor Stribiżew Apr 17 '16 at 20:11
  • Hi Wiktor, thanks a lot for the demo you posted. I posted a function below where the core of it is your code. – Alejandro Simkievich Apr 17 '16 at 20:15
  • I think your question is now a duplicate of [*Extract float/double value*](http://stackoverflow.com/questions/385558/extract-float-double-value). – Wiktor Stribiżew Apr 17 '16 at 20:17
  • btw, Aminah's solution below is the most elegant – Alejandro Simkievich Apr 17 '16 at 20:19
  • I hate to be the bearer of bad news, but you can't separate numbers in free form text, because you cannot differentiate the dot `.` as being a punctuation or a decimal point. If you can do this let me know. If it doesn't matter then what's the use. –  Apr 18 '16 at 01:15

2 Answers2

12

Easy! Just replace it and use Regex variable. Don't forget to strip whitespaces. Please try this code:

import re
the_str = "4x5x6"
print re.sub(r"([0-9]+(\.[0-9]+)?)",r" \1 ", the_str).strip() // \1 refers to first variable in ()
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
3

I used split, like you did, but modified it like this:

>>> tcs = ['123', 'abc', '4x5x6', '7.2volt', '60BTU', '20v', '4*5', '24in', 'google.com-1.2', '1.2.3']
>>> pattern = r'(-?[0-9]+\.?[0-9]*)'
>>> for test in tcs: print(repr(test), repr(' '.join(segment for segment in re.split(pattern, test) if segment)))
'123' '123'
'abc' 'abc'
'4x5x6' '4 x 5 x 6'
'7.2volt' '7.2 volt'
'60BTU' '60 BTU'
'20v' '20 v'
'4*5' '4 * 5'
'24in' '24 in'
'google.com-1.2' 'google.com -1.2'
'1.2.3' '1.2 . 3'

Seems to have the desired behavior.

Note that you have to remove empty strings from the beginning/end of the array before joining the string. See this question for an explanation.

Community
  • 1
  • 1
André Laszlo
  • 15,169
  • 3
  • 63
  • 81