0

I need to transform a string 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG' into a list of tuples [('apple', 'SP'), ('+', 'SW'), ('orange', 'NNG'), ('+', 'FG), ('melon', 'SL'), ('food', 'JKG')] I think, first I need to split a string with separator '+', and then split with a separator '/'.

But the problem is that there are two plus signs. First plus sign I need to take as a separator and second one I need to save. If split a string simply with a separator '+', it removes all plus signs:

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
x = s.split('+')
print(x)
#['apple/SP', '', '/SW', 'orange/NNG', '', '/FG', 'melon/SL', 'food/JKG']

If split with a separator '++':

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
splitted_s = s.plit('++')
print(x)
#['apple/SP', '/SW+orange/NNG', '/FG+melon/SL+food/JKG']

I have no idea of how to come to the result of [('apple', 'SP'), ('+', 'SW'), ('orange', 'NNG'), ('+', 'FG), ('melon', 'SL'), ('food', 'JKG')]

halfer
  • 19,824
  • 17
  • 99
  • 186
july
  • 1

3 Answers3

1

You could use a regular expression:

  • \+(?=\+) - plus followed by another plus (positive lookahead)
  • | - or
  • \+(?!/) - plus not followed by a forward slash (negative lookahead)

Code:

import re

pattern = r"\+(?=\+)|\+(?!/)"
string = "apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG"

print([s.split("/") for s in re.split(pattern, string)])

Output:

[['apple', 'SP'], ['+', 'SW'], ['orange', 'NNG'], ['+', 'FG'], ['melon', 'SL'], ['food', 'JKG']]
Paul M.
  • 10,481
  • 2
  • 9
  • 15
0

Here is one solution:

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
x = s.replace("++", "+/*")
x = x.split('+')
x = [item.replace("*", "+") for item in x]
x = [item.split('/') for item in x]
y = []
for item in x:
    y += item
#remove the list items that are ''
for i in range(y.count('')):
    y.remove('')
# modified from https://stackoverflow.com/questions/53990075/convert-list-into-list-of-tuples-of-every-two-elements
out = []
it = iter(y)
for i in range(len(y)):
    if i % 2 == 0 and i < len(y) - 1:
        out.append((y[i], y[i + 1]))
print(out)

Result:

[('apple', 'SP'), ('+', 'SW'), ('orange', 'NNG'), ('+', 'FG'), ('melon', 'SL'), ('food', 'JKG')]
David
  • 33
  • 1
  • 7
0

This answer is similar to the one proposed by Paul, but I think mine is simpler.

import re

s = "apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG"
pattern = r"((?:\+|\w+)\/\w+)"
res = [tuple(m.split("/")) for m in re.findall(pattern, s)]
FountainTree
  • 316
  • 1
  • 11