Finds all rows fitting to combinatorial condition

Question

I'm looking for the best way to do this using python\excel\sql\google sheets - I need to find all rows which fits to k values from list of n values.

For example I have this table called Animals:

| Name     | mammal | move   |  dive |
+----------+--------+--------+-------+
| Giraffe  |  1     |    1   |   0   |
| Frog     |  0     |    1   |   1   |
| Dolphin  |  1     |    1   |   1   |
| Snail    |  0     |    1   |   0   | 
| Bacteria |  0     |    0   |   0   |

And I want to write a function foo that do behave like that:

foo(tuple of Boolean values, minimum matches)

foo((1,1,1),3) -> Dolphin
foo((1,1,1),2) -> Giraffe, Dolphin, Frog
foo((1,1,1),1) -> Giraffe, Dolphin, Frog, Snail
foo((1,1,0),2) -> Giraffe, Dolphin
foo((0,1,1),2) -> Dolphin, Frog
foo((0,1,1),1) -> Giraffe, Dolphin, Frog, Snail
foo((1,1,1),0) -> Giraffe, Dolphin, Frog, Snail, Bacteria

What's the best way you think about?

How is foo((1,1,0),2) only Giraffe, Dolphin? Wouldn't there be snail? — TheMaster, Nov 19 '17 at 14:04
Only Dolphin and Frog move and dive – this is reply for foo((0,1,1),2) -> Dolphin, Frog, but both dolphin & snail is 1.not a mammal and snail can 2.move. 2 matches. — TheMaster, Nov 19 '17 at 14:08

PM 2Ring · Accepted Answer · 2017-11-19T14:22:11.373

Here's a pure Python 3 solution.

data = [
    ('Giraffe',  1, 1, 0),
    ('Frog',     0, 1, 1),
    ('Dolphin',  1, 1, 1),
    ('Snail',    0, 1, 0),
    ('Bacteria', 0, 0, 0),
]

probes = [
    ((1, 1, 1), 3),
    ((1, 1, 1), 2),
    ((1, 1, 1), 1),
    ((1, 1, 0), 2),
    ((0, 1, 1), 2),
    ((0, 1, 1), 1),
    ((1, 1, 1), 0),
]

def foo(mask, minmatch):
    for name, *row in data:
        if sum(u & v for u, v in zip(mask, row)) >= minmatch:
            yield name

for mask, minmatch in probes:
    print(mask, minmatch, *foo(mask, minmatch))

output

(1, 1, 1) 3 Dolphin
(1, 1, 1) 2 Giraffe Frog Dolphin
(1, 1, 1) 1 Giraffe Frog Dolphin Snail
(1, 1, 0) 2 Giraffe Dolphin
(0, 1, 1) 2 Frog Dolphin
(0, 1, 1) 1 Giraffe Frog Dolphin Snail
(1, 1, 1) 0 Giraffe Frog Dolphin Snail Bacteria

Tested on Python 3.6.0. It uses some syntax that's not available in older versions, but it's easy to adapt it to use older syntax.

This variation runs on older versions of Python. Tested on Python 2.6.6.

from __future__ import print_function

data = [
    ('Giraffe',  1, 1, 0),
    ('Frog',     0, 1, 1),
    ('Dolphin',  1, 1, 1),
    ('Snail',    0, 1, 0),
    ('Bacteria', 0, 0, 0),
]

probes = [
    ((1, 1, 1), 3),
    ((1, 1, 1), 2),
    ((1, 1, 1), 1),
    ((1, 1, 0), 2),
    ((0, 1, 1), 2),
    ((0, 1, 1), 1),
    ((1, 1, 1), 0),
]

def foo(mask, minmatch):
    for row in data:
        if sum(u & v for u, v in zip(mask, row[1:])) >= minmatch:
            yield row[0]

for mask, minmatch in probes:
    matches = list(foo(mask, minmatch))
    print(mask, minmatch, matches)

output

(1, 1, 1) 3 ['Dolphin']
(1, 1, 1) 2 ['Giraffe', 'Frog', 'Dolphin']
(1, 1, 1) 1 ['Giraffe', 'Frog', 'Dolphin', 'Snail']
(1, 1, 0) 2 ['Giraffe', 'Dolphin']
(0, 1, 1) 2 ['Frog', 'Dolphin']
(0, 1, 1) 1 ['Giraffe', 'Frog', 'Dolphin', 'Snail']
(1, 1, 1) 0 ['Giraffe', 'Frog', 'Dolphin', 'Snail', 'Bacteria']

@SuperMario It's fairly new syntax that's only available in recent versions of Python. Basically, `name, *row` splits the current `data` tuple so the 1st tuple item gets assigned to `name` and the rest of the tuple gets assigned to `row`. Here's a simpler example: `a, *b = (1,2,3)` puts `1` into `a` and `(2,3)` into `b`. Similarly, `*a, b=(4,5,6)` puts `(4,5)` into `a` and `6` into `b`. — PM 2Ring, Nov 19 '17 at 14:20
@SuperMario Yes. :) It's called [Extended Iterable Unpacking](https://www.python.org/dev/peps/pep-3132/#acceptance). It is mentioned on [this Stack overflow page](https://stackoverflow.com/q/6967632/4014959), with lots of other variations of tuple unpacking. — PM 2Ring, Nov 19 '17 at 14:29

SCKU · Answer 2 · 2017-11-19T14:15:53.430

I will try to use python with pandas

assume "Name" column is pandas index:

def foo(df, bool_index, minimum_matches):
    picked_column_index = [ idx for (idx, i) in enumerate(bool_index) if i] # select where "1" is
    picked_df = df.iloc[:, picked_column_index] #select column by location
    matched_row_bool = picked_df.sum(axis=1) >= minimum_matches
    return picked_df[matched_row_bool].index.tolist()

df is a pandas dataframe read from the table (Animals) maybe:

df = pandas.read_csv('animials_csv_file_path')

or

df = pandas.read_excel('animials_xls_file_path')

It will return a list include matched name

what is df in this function? – Super Mario Nov 19 '17 at 14:06 — Super Mario, Nov 19 '17 at 14:06

Bert Kellerman · Answer 3 · 2017-11-19T14:14:29.240

1

If the table is a pandas dataframe:

def foo(df, val, n_match):
    results = []
    for r in  df.values:
        if sum(val & r[1:]) >= n_match:
            results.append(r[0])
    print("foo(%s), %d -> %s") % (val, n_match, ' '.join(results))

edited Nov 19 '17 at 14:14

answered Nov 19 '17 at 14:09

Bert Kellerman

1,590
10
17

Finds all rows fitting to combinatorial condition

3 Answers3