8

I have gone through Find intersection of two lists?, Intersection of Two Lists Of Strings, Getting intersection of two lists in python. However, I could not solve this problem of finding intersection between two string lists using Python.

I have two variables.

A = [['11@N3'], ['23@N0'], ['62@N0'], ['99@N0'], ['47@N7']]

B  = [['23@N0'], ['12@N1']]

How to find that '23@N0' is a part of both A and B?

I tried using intersect(a,b) as mentioned in http://www.saltycrane.com/blog/2008/01/how-to-find-intersection-and-union-of/ But, when I try to convert A into set, it throws an error:

File "<stdin>", line 1, in <module> TypeError: unhashable type: 'list'

To convert this into a set, I used the method in TypeError: unhashable type: 'list' when using built-in set function where the list can be converted using

result = sorted(set(map(tuple, A)), reverse=True)

into a tuple and then the tuple can be converted into a set. However, this returns a null set as the intersection.

Can you help me find the intersection?

Community
  • 1
  • 1
Sharath Chandra
  • 89
  • 1
  • 1
  • 9
  • 1
    The fastest way to intersect a big bunch of data is to use Python sets. Python sets are hash maps, therefore they require hashing. Your problem comes from wrapping strings into lists. Lists are mutable objects, that's why they can't be hashed, while strings, being immutable, can be. – Eli Korvigo Feb 24 '15 at 08:54
  • 1
    Is there a reason you have a single string in each list? – Peter Wood Feb 24 '15 at 08:58
  • This is the dataset I have, I did not generate it, borrowed it from someone. – Sharath Chandra Feb 24 '15 at 10:02
  • @SharathChandra: what does "borrowed" mean? Have you read it from a file? What format? – jfs Feb 24 '15 at 10:20
  • related: [Flattening a shallow list in Python](http://stackoverflow.com/q/406121/4279) – jfs Feb 24 '15 at 10:40
  • @J.F.Sebastian: Yes, I have read it from a file. It is json. – Sharath Chandra Feb 25 '15 at 10:40

7 Answers7

7

You can use flatten function of compiler.ast module to flatten your sub-list and then apply set intersection like this

from compiler.ast import flatten

A=[['11@N3'], ['23@N0'], ['62@N0'], ['99@N0'], ['47@N7']]
B=[['23@N0'], ['12@N1']]

a = flatten(A)
b = flatten(B)
common_elements = list(set(a).intersection(set(b)))
common_elements
['23@N0']
Anurag Sharma
  • 4,839
  • 13
  • 59
  • 101
2

The problem is that your lists contain sublists so they cannot be converted to sets. Try this:

A=[['11@N3'], ['23@N0'], ['62@N0'], ['99@N0'], ['47@N7']]
B=[['23@N0'], ['12@N1']]

C = [item for sublist in A for item in sublist]
D = [item for sublist in B for item in sublist]

print set(C).intersection(set(D))
igavriil
  • 1,001
  • 2
  • 12
  • 18
2

Your datastructure is a bit strange, as it is a list of one-element lists of strings; you'd want to reduce it to a list of strings, then you can apply the previous solutions:

Thus a list like:

B = [['23@N0'], ['12@N1']]

can be converted to iterator that iterates over '23@N0', '12@N1'

with itertools.chain(*), thus we have simple oneliner:

>>> set(chain(*A)).intersection(chain(*B))
{'23@N0'}
  • This does seem to be working if A and B are reversed in the last statement. That is, if we try set(B).intersection(A), it results an empty set. – Sharath Chandra Feb 24 '15 at 10:01
2

In case you have to fit it on a fortune cookie:

set(i[0] for i in A).intersection(set(i[0] for i in B))
jwolf
  • 908
  • 7
  • 13
0

You have two lists of lists with one item each. In order to convert that to a set you have to make it a list of strings:

set_a = set([i[0] for i in A])
set_b = set([i[0] for i in B])

Now you can get the intersection:

set_a.intersection(set_b)
Klaus D.
  • 13,874
  • 5
  • 41
  • 48
0
A=[['11@N3'], ['23@N0'], ['62@N0'], ['99@N0'], ['47@N7']]
A=[a[0] for a in A]
B=[['23@N0'], ['12@N1']]
B=[b[0] for b in B]
print set.intersection(set(A),set(B))

Output:set(['23@N0'])

If each of your list has sublists of only 1 element you can try this.

vks
  • 67,027
  • 10
  • 91
  • 124
0

My preference is to use itertools.chain from the standard library:

from itertools import chain

A = [['11@N3'], ['23@N0'], ['62@N0'], ['99@N0'], ['47@N7']]

B = [['23@N0'], ['12@N1']]

set(chain(*A)) & set(chain(*B))

# {'23@N0'}
jpp
  • 159,742
  • 34
  • 281
  • 339