-2

I have a list like this:

dummy_list = [(8, 'N'),
 (4, 'Y'),
 (1, 'N'),
 (1, 'Y'),
 (3, 'N'),
 (4, 'Y'),
 (3, 'N'),
 (2, 'Y'),
 (1, 'N'),
 (2, 'Y'),
 (1, 'N')]

and would like to get the biggest value in 1st column of the sets inside where value in the 2nd column is 'Y'.

How do I do this as efficiently as possible?

Naveen Reddy Marthala
  • 2,622
  • 4
  • 35
  • 67

5 Answers5

4

You can use max function with generator expression.

>>> dummy_list = [(8, 'N'),
...  (4, 'Y'),
...  (1, 'N'),
...  (1, 'Y'),
...  (3, 'N'),
...  (4, 'Y'),
...  (3, 'N'),
...  (2, 'Y'),
...  (1, 'N'),
...  (2, 'Y'),
...  (1, 'N')]
>>>
>>> max(first for first, second in dummy_list if second == 'Y')
4
Abdul Niyas P M
  • 18,035
  • 2
  • 25
  • 46
1

You can use pandas for this as the data you have resembles a table.

import pandas as pd

df = pd.DataFrame(dummy_list, columns = ["Col 1", "Col 2"]) 
val_y = df[df["Col 2"] == "Y"]
max_index = val_y["Col 1"].idxmax()

print(df.loc[max_index, :])

First you convert it into a pandas dataframe using pd.DataFrame and set the column name to Col 1 and Col 2.

Then you get all the rows inside the dataframe with Col 2 values equal to Y.

Once you have this data, just select Col 1 and apply the idxmax function on it to get the index of the maximum value for that series.

You can then pass this index inside the loc function as the row and : (every) as the column to get the whole row.

It can be compressed to two lines in this way,

max_index = df[df["Col 2"] == "Y"]["Col 1"].idxmax()
df.loc[max_index, :]

Output -

Col 1    4
Col 2    Y
Name: 1, dtype: object
Zero
  • 1,800
  • 1
  • 5
  • 16
0
max([i[0] for i in dummy_list if i[1] == 'Y'])
TDT
  • 1
  • 1
0

max([i for i in dummy_list if i[1] == 'Y'])

output: (4, 'Y')

or


max(filter(lambda x: x[1] == 'Y', dummy_list))

output: (4, 'Y')
Will
  • 792
  • 1
  • 5
  • 22
-1

By passing a callback function to max to get a finer search, no further iterations are required.

y_max = max(dummy_list, key=lambda p: (p[0], 'Y'))[0]
print(y_max)

By decoupling the pairs and classify them wrt to the Y,N values

d = {}
for k, v in dummy_list:
    d.setdefault(v, []).append(k)

y_max = max(d['Y'])

By a zip-decoupling one can use a mask-like approach using itertools.compress

values, flags = zip(*dummy_list)
y_max = max(it.compress(values, map('Y'.__eq__, flags)))
print(y_max)

A basic for-loop approach

y_max = dummy_list[0][0]
for i, c in dummy_list:
    if c == 'Y':
        y_max = max(y_max, i)
print(y_max)

EDIT: benchmark results.

Each data list is shuffled before execution and ordered from fastest to slowest. The functions tested are those given by the users and the given identifier (I hope) should make easy to recognize the right one.

Test repeated 100-times with data with 11 terms (original amount of data)

max_gen         ms: 8.184e-04
for_loop        ms: 1.033e-03
dict_classifier ms: 1.270e-03
zip_compress    ms: 1.326e-03
max_key         ms: 1.413e-03
max_filter      ms: 1.535e-03
pandas          ms: 7.405e-01

Test repeated 100-times with data with 110 terms (10 x more data)

max_key         ms: 1.497e-03
zip_compress    ms: 7.703e-03
max_filter      ms: 8.644e-03
for_loop        ms: 9.669e-03
max_gen         ms: 9.842e-03
dict_classifier ms: 1.046e-02
pandas          ms: 7.745e-01

Test repeated 100-times with data with 110000 terms (10000 x more data)

max_key         ms: 1.418e-03
max_gen         ms: 4.787e+00
max_filter      ms: 8.566e+00
dict_classifier ms: 9.116e+00
zip_compress    ms: 9.801e+00
for_loop        ms: 1.047e+01
pandas          ms: 2.614e+01

When increasing the amount of data the "performance classes" change but max_key seems to be not affected.

cards
  • 3,936
  • 1
  • 7
  • 25