1

I'm having a hard time trying to figure out the best way to do a search on 2 lists. I've explored many post here that suggest using things like any() or all() or regex... I do have it functioning now, but I use this type of search a lot and I really would like to be doing this right.

SearchList = ['blah-1.2.3.tar.gz', 'blah-1.2.4.tar.gz', 'blah-1.2.5.tar.gz']
BaseList = ['blah-1.2.3', 'blah-1.2.4']

I would like to search the BaseList for the '1.2.3', '1.2.4', and '1.2.5' in the items of SearchList. I have been using a for loop within a for thing but I would like something cleaner that would use pythons list comprehension/generator expression, and any() or some other regex type solution.

pn1 dude
  • 4,286
  • 5
  • 30
  • 26
  • please post your current solution ... but it sounds like this is better suited to codereview.stackexchange.com – Joran Beasley Dec 05 '13 at 17:57
  • I don't want my solution... I was asking for any suggestions that didn't include a 'for within a for' type solution. – pn1 dude Dec 05 '13 at 18:04
  • I would suggest using a regex to only get the comparable/to-be-searched strings and do the search/compare with python's `set()` – sphere Dec 05 '13 at 18:10
  • Thanks Joran for the tip about codereview.stackexchange.com just checked it out and signed up... Looks very cool. – pn1 dude Dec 05 '13 at 18:23

2 Answers2

4

You can just do:

[a for a in SearchList if a[:-7] in BaseList]

The a[:-7] will strip the .tar.gz at the end, the rest is basic list comprehension. It will return a list of elements from SearchList that correspond to elements from BaseList.

Boo
  • 707
  • 5
  • 6
  • Perfect. Is there a way to also grab what doesn't exist in BaseList? – pn1 dude Dec 05 '13 at 18:09
  • I was using something like a[:-7] but a bit more complex as I don't always know blah. What is this called btw? And where can one find info on it? – pn1 dude Dec 05 '13 at 18:14
  • Python's slice notation, you mean? http://stackoverflow.com/questions/509211/pythons-slice-notation – supermitch Dec 05 '13 at 18:25
  • What is what called? The [:-7] bit? That's using slice notation. The brackets indicate an index, like you'd use to query a list or a dict (essentially, a String is just a list of chars), the colon indicates a range, and the number is the index you want. A negative number counts from the right instead of the left. For example, for `x = "abcdefg"`, `x[0] == "a"`, `x[0:2] == "ab"`, `x[2:] == "cdefg"`, `x[:2] == "ab"`, `x[-3] == "e"`,`'x[-3:-1] == "efg"`, `x[-3:] == "efg"`, `x[:-3]=="abcd"` – Adam Smith Dec 05 '13 at 18:30
  • In case your Searchlist has many different types of compressed files you can do a.replace('.zip', '').replace('.tar', '').replace('.gz', '') etc. Assuming you won't have one of those as part of the filename (like 'myfile.zipfoobar.zip') which would be weird anyway. Regex will get you a perfect check. – Boo Dec 05 '13 at 18:32
  • No sorry I didn't mean slice notation but the rest of it. – pn1 dude Dec 06 '13 at 16:03
  • The rest of it is a simple list comprehension: http://docs.python.org/2/tutorial/datastructures.html#list-comprehensions. [f(x) for x in somelist] is how you use it (in its simplest form). So for instance if you want to add 3 to every element of [1,2,3] you can do [x + 3 for x in [1, 2, 3]]. You can also add a condition, pending on which your f(x) result will or will not be part of the returned list, for instance [x + 3 for x in [1, 2, 3] if x != 2] == [4, 6]. Hope this helps. – Boo Dec 06 '13 at 16:06
1

Alternately, take advantage of the fact that str.startswith accepts a tuple argument

t = tuple(BaseList)
[x for x in SearchList if x.startswith(t)]
iruvar
  • 22,736
  • 7
  • 53
  • 82
  • I tried [x for x in SearchList if not x.startswith(tuple(BaseList))] and yes that works fine as well. It might be a little (just...) slower than the one with slice notation. – pn1 dude Dec 08 '13 at 01:43