3

I need to split a string without removal of delimiter in Python.

Eg:

content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
content = content.split('\s\d\s') 

After this I am getting like this:

This\n
string is very big\n
i need to split it\n
into paragraph wise.\n
But this string\n
not a formated string.

but I want like this way:

This\n
1 string is very big\n
2 i need to split it\n
3 into paragraph wise.\n
4 But this string\n
5 not a formated string
Viacheslav Kondratiuk
  • 8,493
  • 9
  • 49
  • 81
Johnny
  • 47
  • 1
  • 6
  • Possible duplicate of [Python split() without removing the delimiter](http://stackoverflow.com/questions/7866128/python-split-without-removing-the-delimiter) – xandermonkey Jul 12 '16 at 11:40
  • Possible duplicate of [In Python, how do I split a string and keep the separators?](http://stackoverflow.com/questions/2136556/in-python-how-do-i-split-a-string-and-keep-the-separators) – zondo Jul 12 '16 at 11:49
  • delimiter will be inside in "content" string (ie number), we cant give before. – Johnny Jul 12 '16 at 12:20

5 Answers5

2

Use regex module provided by python. by re.sub you can find a regex group and replace it with your desired string. \g<0> is used to use the matched group ( in this case the numbers ).

Example:

import re

content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
result = re.sub(r'\s\d\s',r'\n\g<0>',content)

Result would be :

'This\n 1 string is very big\n 2 i need to split it\n 3 into paragraph wise.\n 4 But this string\n 5 not a formated string.'

Here is more in-depth details about re.sub

mitghi
  • 889
  • 7
  • 20
2

You could use re.split with forward lookahead:

import re
re.split('\s(?=\d\s)',content)

resulting in:

['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']

This splits on spaces -- but only those which are immediately followed by a digit then another space.

John Coleman
  • 51,337
  • 7
  • 54
  • 119
0

Why not just store the output, iterate over it, and place your delimiters back where you want them? If the delimiters need to change each time, you could use the index of the loop that you use to iterate to decide what they are/need to be.

You might find this post useful.

Community
  • 1
  • 1
xandermonkey
  • 4,054
  • 2
  • 31
  • 53
0

You can try this

import re
content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
[ i.group(0).strip() for i in re.finditer('\S\d?[^\d]+', content)]

This one stops matching the string when it reaches a digit, but digits at the beginning are allowed.

Following is the output:

['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']
Kevin
  • 901
  • 1
  • 7
  • 15
0

If it is a question of new lines only, then use the string method splitlines() with keepends=True:

>>> "This\nis\na\ntest".splitlines(True)
["This\n", "is\n", "a\n", "test"]

Otherwise you can:

def split (s, d="\n"):
    d = str(d)
    if d=="": raise ValueError, "empty separator"
    f = s.find(d)
    if f==-1: return [s]
    l = []
    li = 0 # Last index
    add = len(d)
    while f!=-1:
        l.append(s[li:f+add])
        li = f+add
        f = s.find(d, li)
    e = s[li:]
    if e: l.append(e)
    return l
Dalen
  • 4,128
  • 1
  • 17
  • 35