Detect what is in brackets in two lines

Question

if I have a text like this

1
<src> he is a [man]</src>
<tgt>lui è un [uomo]</tgt>
2
<src> she is a [woman]</src>
<tgt>lei è una donna</tgt>
3
<src> he works well</src>
<tgt> lui lavora [bene]</tgt>

and I want to detect the strings between the brackets only if the brackets are present in the src and tgt line, so in the text above, I want to detect only [man][uomo], because in the src line there is [man] and in the tgt line there is [uomo]. Can someone help me

I tried this code

line = str()
num = str()
line1 = str()
num1 = str()

for i, line in enumerate(file):
    lines = iter(filer1)
    if line.startswith("<src>"):
        line += '%s\n' % line.strip()
        num += '%s\n' % filer1[i-1]
    if line.startswith("<tgt>"):
        line1 += '%s\n' % line.strip()
        num1 += '%s\n' % filer1[i-2]
for l in line.splitlines():
      for ll in line1.splitlines():
          for n in num.splitlines():
              for nn in num1.splitlines():
                   if n ==nn:
                      m = re.findall(r"\[(.*?)\]",l)
                      mm = re.findall(r"\[(.*?)\]",ll)
                      if m and mm:
                            print '[{}]'.format(m[0]), '[{}]'.format(mm[0])

Do you want all strings in square brackets anywhere or only strings within your angle-bracketed tags that are within square brackets? Also, you're supposed to come here once you're stuck, not before you start (you have no code; you're just asking us to do it for you). — Two-Bit Alchemist, Apr 29 '14 at 22:11

score 1 · Accepted Answer · answered Apr 29 '14 at 22:16

Basically, what you should do is: first, clean up your text input so that you have a list of lists, where each sublist contains a src line and a tgt line. Then, loop over the pairs of lines, and use re to test for the presence of text within square brackets in both src and tgt. If both src and tgt have bracketed text, display them; otherwise, don't.

This should be pretty straightforward, and should look something like the below:

import re

# see <http://stackoverflow.com/a/312464/1535629>
def chunks(l, n):
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

text = '''1
<src> he is a [man]</src>
<tgt>lui è un [uomo]</tgt>
2
<src> she is a [woman]</src>
<tgt>lei è una donna</tgt>
3
<src> he works well</src>
<tgt> lui lavora [bene]</tgt>'''
lines = text.split('\n')
linepairs = [chunk[1:] for chunk in chunks(lines, 3)]

regex = re.compile(r'\[\w*\]')
for src, tgt in linepairs:
    src_match = re.search(regex, src)
    tgt_match = re.search(regex, tgt)
    if src_match and tgt_match:
        print(src_match.group(), tgt_match.group())

Result:

[man] [uomo]

score 0 · Answer 2 · answered Apr 29 '14 at 23:17

Assuming that your file strictly follows the three-line pattern, you could do

# assumes Python 2.7
from itertools import izip_longest
import re

INPUT = "translations.txt"

# from itertools documentation
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

in_brackets = re.compile("\[(.*?)\]").search

def main():
    with open(INPUT) as inf:
        for num,en,it in grouper(inf, 3, ""):
            en = in_brackets(en)
            it = in_brackets(it)
            if en and it:
                print("[{}] -> [{}]".format(en.group(1), it.group(1)))

main()

Detect what is in brackets in two lines

2 Answers2