2

I have a very large list and want to check the subsets positions,I try this:

l = ['7', '10', '8', '8', '6', '13', '7', '10', '13', '13', 
'7', '11', '9', '7', '15', '9', '10', '13', '6', '16']

print(set(['10', '13']).issubset(set(l)))

k= []
for i in range(0, len(l) - 1):
    if l[i] == '10' and l[i + 1] == '13':
        k.append(i)

print(k) 

#True
#[7, 16]

If the list is very large, I dont think this is a Python way, so is there's better way?

miket
  • 167
  • 1
  • 9

3 Answers3

2

chop a sublist, sl length slice len(sl) out of the very long list vll

and see if they equal if sl == vll[i:i+len(sl)]

increment i, for i in range(len(vll)-len(sl)+1)

vll = ['7', '10', '8', '8', '6', '13', '7', '10', '10', '13', 
'7', '11', '9', '7', '15', '9', '10', '10', '6', '16']

sl = ['10', '10']

[i for i in range(len(vll)-len(sl)+1) if sl == vll[i:i+len(sl)]]

Out[986]: [7, 16]
f5r5e5d
  • 3,656
  • 3
  • 14
  • 18
1

What is the most Pythonic way? Well... that depends on what you are trying to accomplish and what you want to optimize for...

If your use case only needs to check for the existence-of and locations for a single subset in a single run of your code... The code you have could suffice. Depending on the data source for your "large list," generators could help you with memory efficiency, but I don't think that is what you are after.

As you have working code for your particular challenge, I am guessing that you are wanting to optimize the performance for these "subset lookups" - meaning you need to check the list for the presence-of and locations for multiple subset (pairs?). If so, to optimize for lookup-speed (at the expense of memory), you could iterate through the long list once and build an index of all subsets and their locations in a Python dictionary, like so:

from collections import defaultdict

large_list = ['7', '10', '8', '8', '6', '13', '7', '10', '10', '13', '7', '11',
              '9', '7', '15', '9', '10', '10', '6', '16']

indexed_subsets = defaultdict(list)

for i in range(len(large_list)-1):
    subset = (large_list[i], large_list[i+1])
    indexed_subsets[subset].append(i)


# Test if subset exists
print(('10', '10') in indexed_subsets)

# Print locations where the subset exists
print(indexed_subsets.get(('10', '10')))

# Output:
# True
# [7, 16]

This method has the benefit that both checking for the existence of a subset and getting the locations for the subsets are always fast (O(1) vs. O(n)); though the dictionary will be much bigger than the already "large list" that you want to process.

...it is all about what you are wanting to optimize for.

cmlccie
  • 104
  • 1
  • 5
  • It seems your code dosent work for very large list. [22570, 93853, 150320] Time elapsed: 0.1750638484954834 s [22570, 93853, 150323] Time elapsed: 0.03659486770629883 s False None Time elapsed: 0.2751467227935791 s The last one is your code. – miket Jan 18 '18 at 13:02
  • I’m not sure what inputs you were passing nor if you are doing a single lookup or multiple. As it wasn’t clear from your question, I assumed that you were wanting to look for subsets that were two elements in length (pairs), and wanting to optimize the speed for performing multiple lookups. – cmlccie Jan 20 '18 at 11:33
  • If you are wanting to lookup arbitrarily long subsets, then the code in the accepted solution is probably your best bet - with the syntactic sugar of a list comprehension it is reasonably understandable and concise. – cmlccie Jan 20 '18 at 11:36
  • If you do need to optimize the speed for multiple lookups (and yes, a single lookup will be slower as you have to build the index and then do the lookup) and do so with arbitrarily long sub sets, you could still create a subset index but it will be large and take time to build. Building the index will take time, but individual lookups will be faster than linearly searching through an long list for each subset O(1) vs O(n). – cmlccie Jan 20 '18 at 11:42
0

This way is more fast, I dont know if there's some way more fast than this:

s_vll  = str(vll)
s_sl = str(sl).replace("[", "").replace("]", "")
nl = s_vll.split(s_sl) 
p = []
c = 0
if len(nl) > 1:
    for i in range(0, len(nl) -1):
        c += nl[i].count(",") + i * (len(sl) - 1)
        p.append(c) 
print(p)
miket
  • 167
  • 1
  • 9