Remove regex elements from list

Question

I use python 2.7. I have data in file 'a':

myname1@abc.com;description1
myname2@abc.org;description2
myname3@this_is_ok.ok;description3
myname5@qwe.in;description4
myname4@qwe.org;description5
abc@ok.ok;description7

I read this file like:

with open('a', 'r') as f:
    data = [x.strip() for x in f.readlines()]

i have a list named bad:

bad = ['abc', 'qwe'] # could be more than 20 elements

Now i'm trying to remove all lines with 'abc' and 'qwe' after @ and write the rest to the newfile. So in newfile should be only 2 lines:

myname3@this_is_ok.ok;description3
abc@ok.ok;description7

I've been tryin to use regexp (.?)@(.?);(.*) to get groups, but i don't know what to do next.

Advice me, please!

http://stackoverflow.com/questions/11328940/check-if-list-item-contains-items-from-another-list/11329368#11329368 — Tisho, Jul 05 '12 at 07:19
Tisho, i've been there. But the problem is that i have to use regex to make groups for checking. Or maybe theres other way that i don't know — Alex, Jul 05 '12 at 07:25

score 3 · Answer 1 · answered Jul 05 '12 at 07:59

3

Here's a non-regex solution:

bad = set(['abc', 'qwe'])

with open('a', 'r') as f:
    data = [line.strip() for line in f if line.split('@')[1].split('.')[0] in bad]

answered Jul 05 '12 at 07:59

Joel Cornett

24,192
9
66
88

jamylak · Accepted Answer · 2012-07-05T07:32:55.663

2

import re
bad = ['abc', 'qwe']

with open('a') as f:
    print [line.strip() 
           for line in f
           if not re.search('|'.join(bad), line.partition('@')[2]]

This solution works as long as bad only contains normal characters eg. letters, numbers, underscores but nothing that interferes with the regex expression like 'a|b' as @phihag pointed out.

edited Jul 05 '12 at 07:32

answered Jul 05 '12 at 07:25

jamylak

128,818
30
231
230

This fails for bad values of `bad`, for example `['a|b']`. – phihag Jul 05 '12 at 07:33
@phihag Agreed I will note that. – jamylak Jul 05 '12 at 07:34

phihag · Answer 3 · 2012-07-05T07:32:01.890

0

The regexp .? matches either no or one character. You want .*?, which is a lazy match of multiple characters:

import re
bad = ['abc', 'qwe']

filterf = re.compile('(.*?)@(?!' + '|'.join(map(re.escape, bad)) + ')').match
with open('a') as inf, open('newfile', 'w') as outf:
    outf.writelines(filter(filterf, inf))

edited Jul 05 '12 at 07:32

answered Jul 05 '12 at 07:25

phihag

278,196
72
453
469

`'(.*?)@(?!%s)' % '|'.join(map(re.escape, bad)` – Aleksei astynax Pirogov Jul 05 '12 at 07:37

Nick · Answer 4 · 2012-07-05T08:38:07.097

I have used regular expression to remove lines which contains @abc or @qwe. Not sure if it is the right method

import re
with open('testFile.txt', 'r') as f:
     data = [x.strip() for x in f.readlines() if re.match(r'.*@([^abc|qwe]+)\..*;.*',x)]

print data

Now the data will have lines which does not have '@abc' and '@qwe'

Or use

data = [x.strip() for x in f.readlines() if re.search(r'.*@(?!abc|qwe)',x)]

Based on astynax 's suggestion...

Remove regex elements from list

4 Answers4