Attempting to filter through itemset

Question

I have an itemset called data, where in every itemset is the first element of a tuple. An example of one instance looks like this:

(('5', 'Generous', '<=X'), 0.33333333333333)

I am trying to filter through the itemset in two parts:

Part 1 - I'm trying to filter out the itemset so that the only instances remaining are ones which contain '<=X' within the actual itemset (Within the first part of the tuple for each instance).

Code for Part 1:

for i in data:
    if "<=X" not in i[0]:
        del i

Part 2 - I'm now trying to take the remaining items and filter it in such a way that the only sets remaining are ones where there are three or more items in the itemset.

Code for Part 2:

for i in data:
    if len(i[0]) < 3:
        del i

Despite this, when I try to run the above code I end up with an empty list, but I have looked through the list with a variable inspector and I have seen that there are occurrences of this, but they don't show up after filtering. What's wrong?

What is the desired output? – Anton vBR Feb 16 '18 at 22:41 — Anton vBR, Feb 16 '18 at 22:41

AGN Gazer · Accepted Answer · 2018-02-16T23:03:53.560

1

Hopefully this works for you:

list(filter(lambda x: len(x[0]) >= 3, filter(lambda x: '<=X' in x[0], lst)))

or, combining both conditions:

list(filter(lambda x: len(x[0]) >= 3 and '<=X' in x[0], lst))

Also, notice that del in your loops really do not do what (I think) you want them to accomplish:

In [175]: data = [1, 2, 3, 4]

In [176]: for k in data:
     ...:     del k
     ...:     

In [177]: data
Out[177]: [1, 2, 3, 4]

Moreover, modifying a list while looping over it is a bad idea.

EXAMPLE:

In [183]: lst = [(('5', 'Generous', '<=X'), 0.33333333333333),
     ...:(('Generous', '<=X'), 0.33333333333333)]

In [184]: list(filter(lambda x: len(x[0]) < 3, filter(lambda x: '<=X' in x[0], lst)))
Out[184]: [(('Generous', '<=X'), 0.33333333333333)]

edited Feb 16 '18 at 23:03

answered Feb 16 '18 at 22:43

AGN Gazer

8,025
2
27
45

1

I see. In your solution, you use `< 3`. Does that mean it deletes every set with less than three items? In which case won't your second filter delete every itsemset that contains '<=X' as well? Lastly, would it be better to store the result in another variable, or will this store the result in `lst`? – tushariyer Feb 16 '18 at 22:48
2

@tushariyer I made a mistake and I fixed it in my last edit. – AGN Gazer Feb 16 '18 at 22:51
2

@tushariyer I also added an example – AGN Gazer Feb 16 '18 at 22:53
1

@AGNGazer Python 2 example though right? `filter` doesn't return a list in Python 3. – G_M Feb 16 '18 at 22:54
1

@DeliriousLettuce I have improved compatibility in my latest edit – AGN Gazer Feb 16 '18 at 22:59
@AGNGazer I see that but now wouldn't it be less efficient in Python 2? Wouldn't you be creating two lists instead of one? That is one advantage of using list comprehensions instead of `filter`, cross-compatibility with the same code. – G_M Feb 16 '18 at 23:01
@DeliriousLettuce Alright, I combined both conditions into a single one. Thanks – AGN Gazer Feb 16 '18 at 23:10
@AGNGazer That's good but that isn't what I meant. `filter(...)` returns a list in Python 2 but you are then wrapping it in `list(filter(...))`. Wouldn't that make another copy of the list created by `filter(...)` (in Python 2)? In Python 3, it seems like it would only create one list. – G_M Feb 16 '18 at 23:13
@DeliriousLettuce I hope Python 2 is smart enough to see that `list()` called on a list need to return input. Loss of efficiency would be minimal/negligible. – AGN Gazer Feb 16 '18 at 23:19

G_M · Answer 2 · 2018-02-16T22:50:29.850

>>> part_1_data = [
...     (('5', 'Generous', '<=X'), 0.33333333333333),
...     (('6', 'Generous', '<=Y'), 0.33333333333333),
...     (('7', 'Generous', '<=Z'), 0.33333333333333)
... ]
>>> part_1 = [elem for elem in part_1_data if '<=X' in elem[0]]
>>> part_1
[(('5', 'Generous', '<=X'), 0.33333333333333)]


>>> part_2_data = [
...     (('5', 'Generous', '<=X'), 0.33333333333333),
...     (('6', 'Generous'), 0.33333333333333),
...     (('7',), 0.33333333333333)
... ]
>>> part_2 = [elem for elem in part_2_data if len(elem[0]) >= 3]
>>> part_2
[(('5', 'Generous', '<=X'), 0.33333333333333)]


>>> both = [
...     (('1', 'Generous', '<=X', 4), 0.33333333333333),
...     (('2', 'Generous', '<=Y'), 0.33333333333333),
...     (('3', 'Generous', '<=Z'), 0.33333333333333),
...     (('4', 'Generous', '<=X'), 0.33333333333333),
...     (('5', 'Generous'), 0.33333333333333),
...     (('6',), 0.33333333333333)
... ]
>>> [elem for elem in both if len(elem[0]) >= 3 and '<=X' in elem[0]]
[(('1', 'Generous', '<=X', 4), 0.33333333333333), (('4', 'Generous', '<=X'), 0.33333333333333)]

Wow! I never thought to tackle it like this. Would I be able to just store the result in another variable and be fine? — tushariyer, Feb 16 '18 at 22:51
@tushariyer It would probably be easier to just save them in a new variable. Modifying a list while you iterate over it is never a good idea. — G_M, Feb 16 '18 at 22:53
Thanks lettuce! I love learning ways to manipulate lists like this — tushariyer, Feb 16 '18 at 22:53

score 0 · Answer 3 · answered Feb 16 '18 at 22:54

0

In Python3, you can use unpacking in a list comprehension:

s = [(('5', 'Generous', '<=X'), 0.33333333333333)]
final_results = [((*b, c), a) for (*b, c), a in s if len(b)+1 >=3 and c == '<=X']

answered Feb 16 '18 at 22:54

Ajax1234

69,937
8
61
102

`[v for v in lst if ('<=X' in v[0] and len(v[0]) >= 3)]` works in Python 2 as well – AGN Gazer Feb 16 '18 at 23:02
@AGNGazer you may be running a different version from me, however, `*` is not valid syntax in Python3 when used for list/tuple unpacking. – Ajax1234 Feb 16 '18 at 23:07
I think your code is more complicated because you want to unpack *all* items *assuming* '<=X' is in `c` but OP does not mention that: '<=X' can be anywhere. So, if you want to do any unpacking the following would be enough: `[(v, a) for v, a in lst if '<=X' in v and len(v) >= 3]` – AGN Gazer Feb 16 '18 at 23:16

score 0 · Answer 4 · answered Feb 17 '18 at 02:26

You are trying to modify sequence while iterating throught it. See Remove items from a list while iterating for further info.

IMO you should consider using list comprehensions, which have many advantages over your current approach.

Firstly: performance. list comprehensions are significantly faster, Speed of list comprehension vs for loop with append, Efficiency of list comprehensions, Are these list-comprehensions written the fastest possible way?.

Secondly: code readability. list comprehensions syntax is a way more compact, and thus easier to read, that makes it more "pythonic". List Comprehensions Explained Visually

By the way: Even if modifying sequence while iterating it would be good idea, in most cases keeping input (original) and output (processed) datasets separately is very convenient, and allows to derive multiple further outputs (by filtering against different conditions). Replacing original data with processed/filtered makes further changes to filtering conditions impossible, in other hand it's sometimes advantageous when processing input data is very time and/or resource consuming (or, simply, if it's absolutely sure that original dataset won't be needed anymore).

Further reading: (performance optimizations and pitfalls): Generators or List comprehensions, Beware the Python generators

Quick and simple (though a bit dirty) solution to filter data structure described in your question:

In [1]: dat = [(('1', 'Generous', '<=X', 4), 0.33333333333333),
   ...:  (('2', 'Generous', '<=Y'), 0.33333333333333),
   ...:  (('3', 'Generous', '<=Z'), 0.33333333333333),
   ...:  (('4', 'Generous', '<=X'), 0.33333333333333),
   ...:  (('5', 'Generous'), 0.33333333333333),
   ...:  (('6',), 0.33333333333333)]
   ...: 
   ...: result = [itm for itm in dat if ('<=X' in itm[0]) and (len(itm[0]) >= 3)]

In [2]: dat
Out[2]: 
[(('1', 'Generous', '<=X', 4), 0.33333333333333),
 (('2', 'Generous', '<=Y'), 0.33333333333333),
 (('3', 'Generous', '<=Z'), 0.33333333333333),
 (('4', 'Generous', '<=X'), 0.33333333333333),
 (('5', 'Generous'), 0.33333333333333),
 (('6',), 0.33333333333333)]

In [3]: result
Out[3]: 
[(('1', 'Generous', '<=X', 4), 0.33333333333333),
 (('4', 'Generous', '<=X'), 0.33333333333333)]

Attempting to filter through itemset

4 Answers4

EXAMPLE: