Python find subsets positions

Question

I have a very large list and want to check the subsets positions,I try this:

l = ['7', '10', '8', '8', '6', '13', '7', '10', '13', '13', 
'7', '11', '9', '7', '15', '9', '10', '13', '6', '16']

print(set(['10', '13']).issubset(set(l)))

k= []
for i in range(0, len(l) - 1):
    if l[i] == '10' and l[i + 1] == '13':
        k.append(i)

print(k) 

#True
#[7, 16]

If the list is very large, I dont think this is a Python way, so is there's better way?

Can you give example input and desired output? I'm having problems following — Mike Tung, Jan 16 '18 at 03:25
The subset maybe large, and the supperset maybe extremely large — miket, Jan 16 '18 at 03:32
You aren't searching for the subset you think you are, as `set(['10', '10'])` is `{'10'}` — user3483203, Jan 16 '18 at 03:33
Possible duplicate of [Check for presence of a sliced list in Python](https://stackoverflow.com/questions/3313590/check-for-presence-of-a-sliced-list-in-python) — r.ook, Jan 16 '18 at 03:34
You are confused with the task, do you want to find the sub-set or sub-list? — THN, Jan 16 '18 at 03:40
Martin Broadhurst's answer in the link maybe the answer,but: i += max(offset_table[len(needle) - 1 - j], char_table.get(haystack[i])); TypeError: '>' not supported between instances of 'NoneType' and 'int' — miket, Jan 16 '18 at 03:50
`set([l.index(i) for i in l if i in ['10','13']])` Could be tried! — Ubdus Samad, Jan 16 '18 at 04:14

score 2 · Accepted Answer · answered Jan 16 '18 at 04:04

chop a sublist, sl length slice len(sl) out of the very long list vll

and see if they equal if sl == vll[i:i+len(sl)]

increment i, for i in range(len(vll)-len(sl)+1)

vll = ['7', '10', '8', '8', '6', '13', '7', '10', '10', '13', 
'7', '11', '9', '7', '15', '9', '10', '10', '6', '16']

sl = ['10', '10']

[i for i in range(len(vll)-len(sl)+1) if sl == vll[i:i+len(sl)]]

Out[986]: [7, 16]

score 1 · Answer 2 · answered Jan 16 '18 at 04:15

What is the most Pythonic way? Well... that depends on what you are trying to accomplish and what you want to optimize for...

If your use case only needs to check for the existence-of and locations for a single subset in a single run of your code... The code you have could suffice. Depending on the data source for your "large list," generators could help you with memory efficiency, but I don't think that is what you are after.

As you have working code for your particular challenge, I am guessing that you are wanting to optimize the performance for these "subset lookups" - meaning you need to check the list for the presence-of and locations for multiple subset (pairs?). If so, to optimize for lookup-speed (at the expense of memory), you could iterate through the long list once and build an index of all subsets and their locations in a Python dictionary, like so:

from collections import defaultdict

large_list = ['7', '10', '8', '8', '6', '13', '7', '10', '10', '13', '7', '11',
              '9', '7', '15', '9', '10', '10', '6', '16']

indexed_subsets = defaultdict(list)

for i in range(len(large_list)-1):
    subset = (large_list[i], large_list[i+1])
    indexed_subsets[subset].append(i)


# Test if subset exists
print(('10', '10') in indexed_subsets)

# Print locations where the subset exists
print(indexed_subsets.get(('10', '10')))

# Output:
# True
# [7, 16]

This method has the benefit that both checking for the existence of a subset and getting the locations for the subsets are always fast (O(1) vs. O(n)); though the dictionary will be much bigger than the already "large list" that you want to process.

...it is all about what you are wanting to optimize for.

It seems your code dosent work for very large list. [22570, 93853, 150320] Time elapsed: 0.1750638484954834 s [22570, 93853, 150323] Time elapsed: 0.03659486770629883 s False None Time elapsed: 0.2751467227935791 s The last one is your code. — miket, Jan 18 '18 at 13:02
I’m not sure what inputs you were passing nor if you are doing a single lookup or multiple. As it wasn’t clear from your question, I assumed that you were wanting to look for subsets that were two elements in length (pairs), and wanting to optimize the speed for performing multiple lookups. — cmlccie, Jan 20 '18 at 11:33
If you are wanting to lookup arbitrarily long subsets, then the code in the accepted solution is probably your best bet - with the syntactic sugar of a list comprehension it is reasonably understandable and concise. — cmlccie, Jan 20 '18 at 11:36
If you do need to optimize the speed for multiple lookups (and yes, a single lookup will be slower as you have to build the index and then do the lookup) and do so with arbitrarily long sub sets, you could still create a subset index but it will be large and take time to build. Building the index will take time, but individual lookups will be faster than linearly searching through an long list for each subset O(1) vs O(n). — cmlccie, Jan 20 '18 at 11:42

score 0 · Answer 3 · answered Jan 18 '18 at 13:20

This way is more fast, I dont know if there's some way more fast than this:

s_vll  = str(vll)
s_sl = str(sl).replace("[", "").replace("]", "")
nl = s_vll.split(s_sl) 
p = []
c = 0
if len(nl) > 1:
    for i in range(0, len(nl) -1):
        c += nl[i].count(",") + i * (len(sl) - 1)
        p.append(c) 
print(p)

Python find subsets positions

3 Answers3