Split element in list while keeping delimiter

Question

I have a list that I want to split up a bit more, but the output for the data is such that I can't keep it in a good format. Below is an example of what I'm doing

data = ['x', '10[mm]', 'y', '15[mm]']
Data = [data.split('[') for item in data]
Output => Data = ['x', '10', 'mm]', 'y', '15' 'mm]']

And I'm looking to get the output to show

Data = ['x', '10', '[mm]', 'y', '15', '[mm]']

I've seen scripts that keep the delimiter by doing the following and re.split, but I don't know how I could implement this into what I have so far

d = ">"
for line in all_lines:
    s =  [e+d for e in line.split(d) if e]

score 1 · Answer 1 · answered Sep 19 '22 at 21:38

Yet another variation subject to the same assumptions as the first two.

import re
data = ['x', '10[mm]', 'y', '15[mm]']
r = []
for item in data:
    m = re.match('(.*?)(\[.*?\])', item)
    if m:
        r.append(m[1])
        r.append(m[2])
    else:
        r.append(item)
print(r)

Prints:

['x', '10', '[mm]', 'y', '15', '[mm]']

score 0 · Answer 2 · answered Sep 19 '22 at 21:26

0

Try (maybe the pattern will need adjustment based on your real data):

import re

data = ["x", "10[mm]", "y", "15[mm]"]

pat = re.compile(r"\[mm\]|\d+|.+")
out = [p for s in data for p in pat.findall(s)]
print(out)

Prints:

['x', '10', '[mm]', 'y', '15', '[mm]']

answered Sep 19 '22 at 21:26

Andrej Kesely

168,389
15
48
91

Could you elaborate on what "[p for s in data for p in pat.findall(s)]" does? I'm new to python and these structures confuse me on what they're doing – Holoptics Sep 19 '22 at 22:13
@Holoptics That's called *list-comprehension*. List-comprehension is pythonic way of creating/filtering lists (but there are also set-/dict-comprehensions) More here: https://stackoverflow.com/questions/20639180/explanation-of-how-nested-list-comprehension-works – Andrej Kesely Sep 19 '22 at 22:15

CryptoFool · Accepted Answer · 2022-09-19T21:33:33.690

A regular expression match is the friend you're looking for:

import re

data = ['x', '10[mm]', 'y', '15[mm]']

pattern = re.compile("^(.*?)(\[.*\])?$")

result = []
for d in data:
    m = pattern.match(d)
    result.append(m.group(1))
    if m.group(2):
        result.append(m.group(2))

print(result)


Result:

    ['x', '10', '[mm]', 'y', '15', '[mm]']

There isn't a lot of variance in your test data, so it isn't clear what patterns are possible. I used the most general pattern possible, assuming that there is at most a single portion in square braces, and if there is a square brace expression, it will show up at the end of the input value. With those constraints, the string before the square braces, and the string inside the square braces can be any combination of characters.

Thank you, this is what I was looking for. Yes the pattern is a bit more complicated, but your pattern works just fine. I think this would have been useful for other portions of my data re-formatting but the pattern structure/identification is confusing me. I'll take a look at the documentation for this when I have the time later. Thank you again — Holoptics, Sep 19 '22 at 22:16

score 0 · Answer 4 · answered Sep 19 '22 at 21:34

Try this:

data = ['x', '10[mm]', 'y', '15[mm]']
subs = '[mm]'
Data = []

for i in data:
    if re.search(subs, i):
        i = i.replace(f'{subs}', '')
        Data.append(i)
        Data.append(f'{subs}')
        continue
    Data.append(i)

The Output:

Data
['x', '10', '[mm]', 'y', '15', '[mm]']

Split element in list while keeping delimiter

4 Answers4