1

I got random text by crawling but, i have to remove all value in bracket

like this

string = "I(have[to]re(m[ov]e)all)value(in)[bracket]"

to string = "Ivalue"

cause i should remove value in bracket for all of crawled string, i got an index data for "[(" and "])" in single bracket case, it works well. but with double or triple bracket case, it did not worked well.

my code is like this

for i in range(len(contents_re[0])):
for j in range(len(trash_index[i][0])-1):
    if trash_index[i][0][j+1] < trash_index[i][1][j]:
        trash_index_re[i][0].append(trash_index[i][0][j])
        trash_index_re[i][1].append(trash_index[i][1][j+1])
    elif trash_index[i][0][j+1] > trash_index[i][1][j]:
        trash_index_re[i][0].append(trash_index[i][0][j])
        trash_index_re[i][1].append(trash_index[i][1][j])

from index data (trash_index[i][0] = contains "([" index data.) i tried to find final index(trash_index_re "([()])blah()" -> ( )blah())

Mazdak
  • 105,000
  • 18
  • 159
  • 188

3 Answers3

1

This is a direct solution that breaks up the string and then iterates over it. It doesn't handle mismatched brackets. And it is most likely not the best.

import re

i = "I(have[to]re(m[ov]e)all)value(in)[bracket]"
t = []

depth = 0
for token in re.split(r'([\[\]\(\)])', i):
   if token in ('(', '['):
      depth = depth + 1
   if depth == 0:
      t.append(token)
   if token in (')', ']'):
      depth = depth - 1

print ''.join(t)

prints:

Ivalue
Dan D.
  • 73,243
  • 15
  • 104
  • 123
1

As you may have a lot of nested string within brackets you can use a recursion function and re.sub that replace the outer brackets in each call :

>>> def spliter(s):
...     new=re.sub(r"\([^()]*?\)|\[([^\[\]]*?)\]","",s)
...     if any(i in new for i in ')[]('):
...        return spliter(re.sub(r"\([^()]*?\)|\[([^\[\]]*?)\]","",new))
...     else : 
...        return new
... 
>>> s="I(have[to]re(m[ov]e)all)value(in)[bracket]"
>>> spliter(s)
'Ivalue'
>>> s="I(have[to]re(m[ov]e)all(asfref)(efsf[ekjfrehf]sef)(dfsd))value(in)[bracket]"
>>> spliter(s)
'Ivalue'
>>> s="I(have[to]re(m[ov]e)all(asfref)(efsf[ekj(9387jjdf)frehf]sef)(dfsd))value(in)[bracket]"
>>> spliter(s)
'Ivalue'

the following regex :

r"\([^()]*?\)|\[([^\[\]]*?)\]"

will replace any thing except brackets and parenthesis within them with an empty string.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
0

Your question basically deals with removing text within nested brackets. Assuming, only round or square brackets (which would be closed properly), you can use the following code as mentioned in this question:

import itertools

def remove_brackets(t):
  p = []
  d = 0
  l = []
  for c in t:
    if c == '[' or c == '(': d += 1
    l.append(d)
    if c == ']' or c == ')': d -= 1
  for k, g in itertools.groupby(zip(t, l), lambda x: x[1]>0):
    b = list(g)
    if max(d for c, d in b) > 0: continue
    p.append(''.join(c for c, d in b))
  print ''.join(p)

remove_brackets(string)

If you just want to remove the text within single tags, you can use regex for this:

result = re.sub("[\(\[].*?[\)\]]","",string)
Community
  • 1
  • 1
sgp
  • 1,738
  • 6
  • 17
  • 31
  • @thankyouman This doesn't give you what you want. it returns `'Iree)all)value' ` – Mazdak Jun 08 '15 at 10:02
  • Oops my bad. I read the question that he wanted to remove brackets from the string. I'll make the proper edits – sgp Jun 08 '15 at 10:04
  • @Kasra is it fine now? – sgp Jun 08 '15 at 10:47
  • @sgp Good job but you have make it very complicated!!! do you think this answer is a proper way for solving this problem??? – Mazdak Jun 08 '15 at 10:51
  • @Kasra Codewise, maybe but logicwise it is quite straightforward, isn't it? – sgp Jun 08 '15 at 10:53
  • @sgp programming is all about efficiency;) check another answers and give a better answer than them. then i'll be the first upvoter ;) Your code is terrible in efficient! – Mazdak Jun 08 '15 at 11:01
  • `list`,`groupby`,`max`,`append`,`join`,`zip`, you have a collection of functions and also another weaknesses!!! – Mazdak Jun 08 '15 at 11:05