How to find common set of pattern from two files in python?

Question

I have file1 listed as :

- 
er
we
ds,e3,kj
uy,mn
po
qw
pi
pi,f

File2 listed as :

- 
df
we
wr
f,pi
ds,kj,e3
rt,uy
qw
po

I tried the following code ,but its not working as intended : -

my_set1 = set(x.strip() for x in (open('file1').readlines()))
print(my_set1)
my_set2 = set(x.strip() for x in (open('file2').readlines()))
print(my_set2)

my_list=list((set(my_set1).intersection(set(my_set2))))
print(my_list,"\n")

with open('common_signals','w') as file3:
    for signal in my_list:
        file3.write("%s\n" %signal)

Output I am getting inside commong signals is : - po ,we ,qw.

It has NEGLECTED ds , kj and e3 ,uy,pi,f.

Can someone help on this ?

Try to format the code first. – Underoos Mar 11 '19 at 10:31 — Underoos, Mar 11 '19 at 10:31

balderman · Answer 1 · 2019-03-11T10:45:50.737

You need to split the lines into sub strings. ('ds,kj,e3' as an example)

Try to use the method 'get_set_of_words'.

The method return a set which you can use for the intersection.

def get_set_of_words(file_name):
    result = set()
    with open(file_name) as f:
        lines = [w.strip() for w in f.readlines()]
        for line in lines:
            words = line.split(',')
            for word in words:
                result.add(word) 
    return result

Vasilis G. · Answer 2 · 2019-03-11T10:51:41.970

A slightly modified version of your code will produce the desired result:

my_set1 = sum([x.strip().split(',') for x in open('file1').readlines()],[])
print(my_set1)

my_set2 = sum([x.strip().split(',') for x in open('file2').readlines()],[])
print(my_set2)

my_list=list((set(my_set1).intersection(set(my_set2))))
print(my_list,"\n")

with open('common_signals','w') as file3:
    for signal in my_list:
        file3.write("%s\n" %signal)

You need to split each list element and then using sum you can flatten the list.

Result:

-
qw
pi
kj
ds
po
e3
f
uy
we

score 0 · Answer 3 · answered Mar 11 '19 at 10:43

This is because in the meaning of strings "ds,e3,kj" and "ds,kj,e3" are not equal. If you need to comper such type of patterns treating them as strings try then to order them ferst and compare after.

if ',' in line:
    line = ','.join(sorted(line.split(',')))

How to find common set of pattern from two files in python?

3 Answers3