How can I split a string if a separator is repeated twice?

Question

I need to transform a string 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG' into a list of tuples [('apple', 'SP'), ('+', 'SW'), ('orange', 'NNG'), ('+', 'FG), ('melon', 'SL'), ('food', 'JKG')] I think, first I need to split a string with separator '+', and then split with a separator '/'.

But the problem is that there are two plus signs. First plus sign I need to take as a separator and second one I need to save. If split a string simply with a separator '+', it removes all plus signs:

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
x = s.split('+')
print(x)
#['apple/SP', '', '/SW', 'orange/NNG', '', '/FG', 'melon/SL', 'food/JKG']

If split with a separator '++':

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
splitted_s = s.plit('++')
print(x)
#['apple/SP', '/SW+orange/NNG', '/FG+melon/SL+food/JKG']

I have no idea of how to come to the result of [('apple', 'SP'), ('+', 'SW'), ('orange', 'NNG'), ('+', 'FG), ('melon', 'SL'), ('food', 'JKG')]

I think you meant `s.split('++')` in the second code example — FountainTree, Dec 13 '21 at 01:20

score 1 · Answer 1 · answered Dec 12 '21 at 00:02

You could use a regular expression:

\+(?=\+) - plus followed by another plus (positive lookahead)
| - or
\+(?!/) - plus not followed by a forward slash (negative lookahead)

Code:

import re

pattern = r"\+(?=\+)|\+(?!/)"
string = "apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG"

print([s.split("/") for s in re.split(pattern, string)])

Output:

[['apple', 'SP'], ['+', 'SW'], ['orange', 'NNG'], ['+', 'FG'], ['melon', 'SL'], ['food', 'JKG']]

David · Answer 2 · 2021-12-13T00:18:26.583

Here is one solution:

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
x = s.replace("++", "+/*")
x = x.split('+')
x = [item.replace("*", "+") for item in x]
x = [item.split('/') for item in x]
y = []
for item in x:
    y += item
#remove the list items that are ''
for i in range(y.count('')):
    y.remove('')
# modified from https://stackoverflow.com/questions/53990075/convert-list-into-list-of-tuples-of-every-two-elements
out = []
it = iter(y)
for i in range(len(y)):
    if i % 2 == 0 and i < len(y) - 1:
        out.append((y[i], y[i + 1]))
print(out)

Result:

[('apple', 'SP'), ('+', 'SW'), ('orange', 'NNG'), ('+', 'FG'), ('melon', 'SL'), ('food', 'JKG')]

score 0 · Answer 3 · answered Dec 13 '21 at 01:20

0

This answer is similar to the one proposed by Paul, but I think mine is simpler.

import re

s = "apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG"
pattern = r"((?:\+|\w+)\/\w+)"
res = [tuple(m.split("/")) for m in re.findall(pattern, s)]

answered Dec 13 '21 at 01:20

FountainTree

316
1
11

How can I split a string if a separator is repeated twice?

3 Answers3