Combine multiple lists and find maximum

Question

Can pandas combine multiple lists of readings and return the maximum reading values for the elements in aoiFeatures?

Given:

# FYI: 2.4 million elements in each of these lists in reality
allFeatures =  [101, 179, 181, 183, 185, 843, 845, 847, 849, 851] 
allReadings1 = [0.27, 0.25, 0.13, 0.04, 0.05, 0.09, 0.15, 0.13, 0.12, 0.20]
allReadings2 = [0.25, 0.06, 0.29, 0.29, 0.04, 0.21, 0.07, 0.06, 0.07, 0.06]
allReadings3 = [0.12, 0.02, 0.20, 0.27, 0.04, 0.08, 0.11, 0.24, 0.00, 0.13]
allReadings4 = [0.21, 0.00, 0.22, 0.11, 0.24, 0.16, 0.11, 0.18, 0.27, 0.14]
allReadings5 = [0.02, 0.18, 0.26, 0.22, 0.23, 0.15, 0.24, 0.28, 0.00, 0.07]
allReadings6 = [0.08, 0.25, 0.21, 0.23, 0.14, 0.21, 0.18, 0.09, 0.17, 0.27]
allReadings7 = [0.20, 0.02, 0.28, 0.16, 0.18, 0.27, 0.29, 0.19, 0.29, 0.13]
allReadings8 = [0.17, 0.01, 0.07, 0.23, 0.14, 0.20, 0.19, 0.01, 0.15, 0.17]
allReadings9 = [0.12, 0.18, 0.09, 0.10, 0.00, 0.03, 0.11, 0.03, 0.14, 0.14]
allReadings10 =[0.13, 0.03, 0.20, 0.13, 0.30, 0.30, 0.28, 0.12, 0.19, 0.22]

# FYI: 67,000 elements in this list in reality
aoiFeatures = [181, 843, 849]

Result:

181 0.29
843 0.27
849 0.29

@jezrael - Pandas loc[] times: Load 42MM elements: 0:01:07.8; Build max for 67K elements: 0:00:04.8 — JackedUpDBA, May 22 '18 at 15:45
@piRSquared - Option #1 python max() times: Load 42MM elements: 0:00:02.2; Build max for 67K elements: 0:01:41.8 — JackedUpDBA, May 22 '18 at 16:14
@piRSquared - Option #2 numpy max() times: Load 42MM elements: 0:00:02.2; Build max for 67K elements: 0:00:00.6 — JackedUpDBA, May 22 '18 at 16:15
So, guys... why did it turn out this way? (Running 64-bit python 3.6 on Windows 10; 8 - i7 cores @ 3.6 GHz; 24 GBytes memory) — JackedUpDBA, May 22 '18 at 16:16

score 2 · Answer 1 · answered May 22 '18 at 12:21

First zip all lists together with DataFrame contructor and index parameter, select rows by loc and get max values:

L = list(zip(allReadings1,
             allReadings2,
             allReadings3,
             allReadings4,
             allReadings5,
             allReadings6,
             allReadings7,
             allReadings8,
             allReadings9,
             allReadings10))

df = pd.DataFrame(L, index=allFeatures)
print (df)
        0     1     2     3     4     5     6     7     8     9
101  0.27  0.25  0.12  0.21  0.02  0.08  0.20  0.17  0.12  0.13
179  0.25  0.06  0.02  0.00  0.18  0.25  0.02  0.01  0.18  0.03
181  0.13  0.29  0.20  0.22  0.26  0.21  0.28  0.07  0.09  0.20
183  0.04  0.29  0.27  0.11  0.22  0.23  0.16  0.23  0.10  0.13
185  0.05  0.04  0.04  0.24  0.23  0.14  0.18  0.14  0.00  0.30
843  0.09  0.21  0.08  0.16  0.15  0.21  0.27  0.20  0.03  0.30
845  0.15  0.07  0.11  0.11  0.24  0.18  0.29  0.19  0.11  0.28
847  0.13  0.06  0.24  0.18  0.28  0.09  0.19  0.01  0.03  0.12
849  0.12  0.07  0.00  0.27  0.00  0.17  0.29  0.15  0.14  0.19
851  0.20  0.06  0.13  0.14  0.07  0.27  0.13  0.17  0.14  0.22

aoiFeatures = [181, 843, 849]
s = df.loc[aoiFeatures].max(axis=1)
print (s)
181    0.29
843    0.30
849    0.29
dtype: float64

LMAO... Alas, I expose my struggle building that dataframe for your amusement: # L = np.transpose(list(zip(allReadings1,allReadings2,allReadings3,allReadings4,allReadings5,allReadings6,allReadings7,allReadings8,allReadings9,allReadings10))) 15 minutes of my life I'll never get back, ugh! — JackedUpDBA, May 22 '18 at 12:24

piRSquared · Accepted Answer · 2018-05-22T12:55:14.293

1

Option 1

You can let Python's max do the work and use pandas.Series to hold the results

readings = [allReadings1, allReadings2, allReadings3, allReadings4, allReadings5,
            allReadings6, allReadings7, allReadings8, allReadings9, allReadings10]

s = pd.Series(dict(zip(allFeatures, map(max, zip(*readings)))))
s[aoiFeatures]

181    0.29
843    0.30
849    0.29
dtype: float64

Option 2

Or leverage Numpy

readings = [allReadings1, allReadings2, allReadings3, allReadings4, allReadings5,
            allReadings6, allReadings7, allReadings8, allReadings9, allReadings10]

s = pd.Series(np.max(readings, 0), allFeatures)
s[aoiFeatures]

181    0.29
843    0.30
849    0.29
dtype: float64

If you needed to update the array of maximums with a new reading

allReadings11 =[0.13, 0.03, 0.30, 0.13, 0.30, 0.30, 0.28, 0.12, 0.19, 0.22]
s[:] = np.maximum(s, allReadings11)

s[aoiFeatures]

181    0.29
843    0.30
849    0.29
dtype: float64

edited May 22 '18 at 12:55

answered May 22 '18 at 12:48

piRSquared

285,575
57
475
624

I'm wondering now which would be the fastest... I'm guessing Option #2. History: I found out how slow using pyodbc+cursor with a stored proc in SQL/Server is (32 minutes), so anything is an improvement at this point! I'll get the data (see FYI's) into all of these solutions and let you guys know. – JackedUpDBA May 22 '18 at 13:15
@Jack sounds like you may want to check https://stackoverflow.com/questions/71022/sql-max-of-multiple-columns and see if that offers anything you could use if your starting point is a SQL table... – Jon Clements May 22 '18 at 13:38
@Jon Clements - Thanks! Starting point is a collection of masked arrays. – JackedUpDBA May 22 '18 at 15:55

score 0 · Answer 3 · answered May 22 '18 at 14:10

0

Very simple and quick task:

pd.DataFrame([allReadings1, allReadings2,...],columns=allFeatures).max()

Sample output:

answered May 22 '18 at 14:10

A.Kot

7,615
2
22
24

Combine multiple lists and find maximum

3 Answers3

Option 1

Option 2