0

I'm looking for the best way to do this using python\excel\sql\google sheets - I need to find all rows which fits to k values from list of n values.

For example I have this table called Animals:

| Name     | mammal | move   |  dive |
+----------+--------+--------+-------+
| Giraffe  |  1     |    1   |   0   |
| Frog     |  0     |    1   |   1   |
| Dolphin  |  1     |    1   |   1   |
| Snail    |  0     |    1   |   0   | 
| Bacteria |  0     |    0   |   0   | 

And I want to write a function foo that do behave like that:

foo(tuple of Boolean values, minimum matches)

foo((1,1,1),3) -> Dolphin
foo((1,1,1),2) -> Giraffe, Dolphin, Frog
foo((1,1,1),1) -> Giraffe, Dolphin, Frog, Snail
foo((1,1,0),2) -> Giraffe, Dolphin
foo((0,1,1),2) -> Dolphin, Frog
foo((0,1,1),1) -> Giraffe, Dolphin, Frog, Snail
foo((1,1,1),0) -> Giraffe, Dolphin, Frog, Snail, Bacteria 

What's the best way you think about?

Super Mario
  • 923
  • 10
  • 16

3 Answers3

5

Here's a pure Python 3 solution.

data = [
    ('Giraffe',  1, 1, 0),
    ('Frog',     0, 1, 1),
    ('Dolphin',  1, 1, 1),
    ('Snail',    0, 1, 0),
    ('Bacteria', 0, 0, 0),
]

probes = [
    ((1, 1, 1), 3),
    ((1, 1, 1), 2),
    ((1, 1, 1), 1),
    ((1, 1, 0), 2),
    ((0, 1, 1), 2),
    ((0, 1, 1), 1),
    ((1, 1, 1), 0),
]

def foo(mask, minmatch):
    for name, *row in data:
        if sum(u & v for u, v in zip(mask, row)) >= minmatch:
            yield name

for mask, minmatch in probes:
    print(mask, minmatch, *foo(mask, minmatch))

output

(1, 1, 1) 3 Dolphin
(1, 1, 1) 2 Giraffe Frog Dolphin
(1, 1, 1) 1 Giraffe Frog Dolphin Snail
(1, 1, 0) 2 Giraffe Dolphin
(0, 1, 1) 2 Frog Dolphin
(0, 1, 1) 1 Giraffe Frog Dolphin Snail
(1, 1, 1) 0 Giraffe Frog Dolphin Snail Bacteria

Tested on Python 3.6.0. It uses some syntax that's not available in older versions, but it's easy to adapt it to use older syntax.


This variation runs on older versions of Python. Tested on Python 2.6.6.

from __future__ import print_function

data = [
    ('Giraffe',  1, 1, 0),
    ('Frog',     0, 1, 1),
    ('Dolphin',  1, 1, 1),
    ('Snail',    0, 1, 0),
    ('Bacteria', 0, 0, 0),
]

probes = [
    ((1, 1, 1), 3),
    ((1, 1, 1), 2),
    ((1, 1, 1), 1),
    ((1, 1, 0), 2),
    ((0, 1, 1), 2),
    ((0, 1, 1), 1),
    ((1, 1, 1), 0),
]

def foo(mask, minmatch):
    for row in data:
        if sum(u & v for u, v in zip(mask, row[1:])) >= minmatch:
            yield row[0]

for mask, minmatch in probes:
    matches = list(foo(mask, minmatch))
    print(mask, minmatch, matches)

output

(1, 1, 1) 3 ['Dolphin']
(1, 1, 1) 2 ['Giraffe', 'Frog', 'Dolphin']
(1, 1, 1) 1 ['Giraffe', 'Frog', 'Dolphin', 'Snail']
(1, 1, 0) 2 ['Giraffe', 'Dolphin']
(0, 1, 1) 2 ['Frog', 'Dolphin']
(0, 1, 1) 1 ['Giraffe', 'Frog', 'Dolphin', 'Snail']
(1, 1, 1) 0 ['Giraffe', 'Frog', 'Dolphin', 'Snail', 'Bacteria']
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • @SuperMario It's fairly new syntax that's only available in recent versions of Python. Basically, `name, *row` splits the current `data` tuple so the 1st tuple item gets assigned to `name` and the rest of the tuple gets assigned to `row`. Here's a simpler example: `a, *b = (1,2,3)` puts `1` into `a` and `(2,3)` into `b`. Similarly, `*a, b=(4,5,6)` puts `(4,5)` into `a` and `6` into `b`. – PM 2Ring Nov 19 '17 at 14:20
  • is it has a name? – Super Mario Nov 19 '17 at 14:24
  • @SuperMario Yes. :) It's called [Extended Iterable Unpacking](https://www.python.org/dev/peps/pep-3132/#acceptance). It is mentioned on [this Stack overflow page](https://stackoverflow.com/q/6967632/4014959), with lots of other variations of tuple unpacking. – PM 2Ring Nov 19 '17 at 14:29
1

I will try to use python with pandas

assume "Name" column is pandas index:

def foo(df, bool_index, minimum_matches):
    picked_column_index = [ idx for (idx, i) in enumerate(bool_index) if i] # select where "1" is
    picked_df = df.iloc[:, picked_column_index] #select column by location
    matched_row_bool = picked_df.sum(axis=1) >= minimum_matches
    return picked_df[matched_row_bool].index.tolist()

df is a pandas dataframe read from the table (Animals) maybe:

df = pandas.read_csv('animials_csv_file_path')

or

df = pandas.read_excel('animials_xls_file_path')

It will return a list include matched name

SCKU
  • 783
  • 9
  • 14
1

If the table is a pandas dataframe:

def foo(df, val, n_match):
    results = []
    for r in  df.values:
        if sum(val & r[1:]) >= n_match:
            results.append(r[0])
    print("foo(%s), %d -> %s") % (val, n_match, ' '.join(results))
Bert Kellerman
  • 1,590
  • 10
  • 17