1

I have two lists: list_1 and list_2 containing 32 bit - strings and each is of length 2^12. I want to sort these lists to get those pairs which contain equal bits at 20 specified positions.

list_w = []
for i in range(0, 4096):
    for j in range(0, 4096):
        if list_1[i][12:32] == list_2[j][12:32]:
           value = (list_1[i], list_2[j])
           list_w.append(value)
        else:
            continue

I have assumed 20 positions at the end and sort all 2^24 pairs. But this takes too much time. The answer was we can do this in 2^12 time by sorting or hashing how?

  • Are you sure you have to sort them? – Neil Apr 28 '22 at 05:39
  • yes I want those pairs which are equal at 20 bit positions say the last 20 bit positions. We expect around 2^4 = 16 to remain – Mathslover shah Apr 28 '22 at 05:44
  • I think this could be, _cf_, first [bit mask](https://stackoverflow.com/questions/8508799/bit-masking-in-python), then [counting duplicates](https://stackoverflow.com/questions/23240969/python-count-repeated-elements-in-the-list). – Neil Apr 28 '22 at 06:11

1 Answers1

1

Use a dictionary to map the values at these 20 positions to the list of strings that match those values at those positions.

Building a dict of lists is a very classical idiom in python:

d1 = {}
for s in list_1:
    d1.setdefault(s[12:32], []).append(s)

d2 = {}
for s in list_2:
    d2.setdefault(s[12:32], []).append(s)

Then you can find your pairs by iterating on the keys that are in both dictionaries; i.e., keys that are in the intersection of the two sets of keys.

list_w = [
    (s1, s2)
    for k in set(d1.keys()).intersection(d2.keys())
    for s1 in d1[k] for s2 in d2[k]
]

Testing:

list_1 = ['abcdefghijklmnopqrstuvwxyzabcdef', 'aaaaaaaaaaaamnopqrstuvwxyzabcdef', 'aaaaaaaaaaaabbbbbbbbbbbbbbbbbbbb']
list_2 = ['ABCDEFGHIJKLmnopqrstuvwxyzabcdef', 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb']

...

print(d1)
# {'mnopqrstuvwxyzabcdef': ['abcdefghijklmnopqrstuvwxyzabcdef',
#                           'aaaaaaaaaaaamnopqrstuvwxyzabcdef'],
#  'bbbbbbbbbbbbbbbbbbbb': ['aaaaaaaaaaaabbbbbbbbbbbbbbbbbbbb']}

print(d2)
# {'mnopqrstuvwxyzabcdef': ['ABCDEFGHIJKLmnopqrstuvwxyzabcdef'],
#  'bbbbbbbbbbbbbbbbbbbb': ['bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb']}

print(list_w)
# [('aaaaaaaaaaaabbbbbbbbbbbbbbbbbbbb', 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb'),
#  ('abcdefghijklmnopqrstuvwxyzabcdef', 'ABCDEFGHIJKLmnopqrstuvwxyzabcdef'),
#  ('aaaaaaaaaaaamnopqrstuvwxyzabcdef', 'ABCDEFGHIJKLmnopqrstuvwxyzabcdef')]
Stef
  • 13,242
  • 2
  • 17
  • 28