How to remove strings from in between brackets with regex...python

Question

I need to pull out a single string containing the words from extracted fields:

[[cat]][[dog]][[mouse]][[apple]][[banana]][[pear]][[plum]][[pool]]

So from this I need: cat dog mouse apple banana pear plum pool.

I've been trying for 2 hours to make a regular expression for this.

The best I get is (?<=[[]\S)(.*)(?=]]) which gets me:

cat]][[dog]][[mouse]][[apple]][[banana]][[pear]][[plum]][[pool

Any ideas? Thanks!

A simple search for characters would do. `/[a-z]+/g`. [Demo](https://regex101.com/r/cX0hA0/1) — , Feb 02 '16 at 22:42
Possible duplicate of [Difference between .\*? and .\* for regex](http://stackoverflow.com/questions/3075130/difference-between-and-for-regex) — HamZa, Feb 02 '16 at 22:43
This really looks like an XY problem where you've created some badly formed data and now need to get at the information. Where is the data coming from? — the Tin Man, Feb 02 '16 at 23:27

score 1 · Answer 1 · answered Feb 02 '16 at 22:46

Here's a solution with re.finditer. Let your string be s. This assumes there can be anything in between [[ and ]]. Otherwise, the comment by @noob applies.

>>> [x.group(1) for x in re.finditer('\[\[(.*?)\]\]', s)]
['cat', 'dog', 'mouse', 'apple', 'banana', 'pear', 'plum', 'pool']

Alternatively, with lookarounds and re.findall:

>>> re.findall('(?<=\[\[).*?(?=\]\])', s)
['cat', 'dog', 'mouse', 'apple', 'banana', 'pear', 'plum', 'pool']

For large strings, the finditer version seemed to be slightly faster when I timed the alternatives.

In [5]: s=s*1000
In [6]: timeit [x.group(1) for x in re.finditer('\[\[(.*?)\]\]', s)]
100 loops, best of 3: 3.61 ms per loop
In [7]: timeit re.findall('(?<=\[\[).*?(?=\]\])', s)
100 loops, best of 3: 5.93 ms per loop

score 1 · Answer 2 · answered Feb 02 '16 at 22:51

1

simple re.split will work:

>>> s = '[[cat]][[dog]][[mouse]][[apple]][[banana]][[pear]][[plum]][[pool]]'
>>> import re
>>> print re.split(r'[\[\]]{2,4}', s)[1:-1]
['cat', 'dog', 'mouse', 'apple', 'banana', 'pear', 'plum', 'pool']

answered Feb 02 '16 at 22:51

midori

4,807
5
34
62

Prune · Answer 3 · 2016-02-02T22:58:50.933

0

Do you have to do it with a regular expression?

extract = "[[cat]][[dog]][[mouse]][[apple]][[banana]][[pear]][[plum]][[pool]]"
word_list = [word for word in extract.replace('[', '').split(']') if word != '']
print word_list

Output:

['cat', 'dog', 'mouse', 'apple', 'banana', 'pear', 'plum', 'pool']

Got it with regular expressions now. SImply find non-empty strings of stuff without brackets.

import re

target = "[[cat]][[dog]][[mouse]][[apple]][[banana]][[pear]][[plum]][[pool]]"
word_list = ' '.join(re.findall("[^\[\]]+", target))
print word_list

Edited to return the single string, rather than a list of strings.

edited Feb 02 '16 at 22:58

answered Feb 02 '16 at 22:43

Prune

76,765
14
60
81

No I dont have too. I had been solving a few of my cleaning text issues with them so i just kept trying them. This did work though. Thanks! – Feb 02 '16 at 22:46

How to remove strings from in between brackets with regex...python

3 Answers3