0

Can pandas combine multiple lists of readings and return the maximum reading values for the elements in aoiFeatures?

Given:

# FYI: 2.4 million elements in each of these lists in reality
allFeatures =  [101, 179, 181, 183, 185, 843, 845, 847, 849, 851] 
allReadings1 = [0.27, 0.25, 0.13, 0.04, 0.05, 0.09, 0.15, 0.13, 0.12, 0.20]
allReadings2 = [0.25, 0.06, 0.29, 0.29, 0.04, 0.21, 0.07, 0.06, 0.07, 0.06]
allReadings3 = [0.12, 0.02, 0.20, 0.27, 0.04, 0.08, 0.11, 0.24, 0.00, 0.13]
allReadings4 = [0.21, 0.00, 0.22, 0.11, 0.24, 0.16, 0.11, 0.18, 0.27, 0.14]
allReadings5 = [0.02, 0.18, 0.26, 0.22, 0.23, 0.15, 0.24, 0.28, 0.00, 0.07]
allReadings6 = [0.08, 0.25, 0.21, 0.23, 0.14, 0.21, 0.18, 0.09, 0.17, 0.27]
allReadings7 = [0.20, 0.02, 0.28, 0.16, 0.18, 0.27, 0.29, 0.19, 0.29, 0.13]
allReadings8 = [0.17, 0.01, 0.07, 0.23, 0.14, 0.20, 0.19, 0.01, 0.15, 0.17]
allReadings9 = [0.12, 0.18, 0.09, 0.10, 0.00, 0.03, 0.11, 0.03, 0.14, 0.14]
allReadings10 =[0.13, 0.03, 0.20, 0.13, 0.30, 0.30, 0.28, 0.12, 0.19, 0.22]

# FYI: 67,000 elements in this list in reality
aoiFeatures = [181, 843, 849]

Result:

181 0.29
843 0.27
849 0.29
  • @jezrael - Pandas loc[] times: Load 42MM elements: 0:01:07.8; Build max for 67K elements: 0:00:04.8 – JackedUpDBA May 22 '18 at 15:45
  • @piRSquared - Option #1 python max() times: Load 42MM elements: 0:00:02.2; Build max for 67K elements: 0:01:41.8 – JackedUpDBA May 22 '18 at 16:14
  • @piRSquared - Option #2 numpy max() times: Load 42MM elements: 0:00:02.2; Build max for 67K elements: 0:00:00.6 – JackedUpDBA May 22 '18 at 16:15
  • So, guys... why did it turn out this way? (Running 64-bit python 3.6 on Windows 10; 8 - i7 cores @ 3.6 GHz; 24 GBytes memory) – JackedUpDBA May 22 '18 at 16:16

3 Answers3

2

First zip all lists together with DataFrame contructor and index parameter, select rows by loc and get max values:

L = list(zip(allReadings1,
             allReadings2,
             allReadings3,
             allReadings4,
             allReadings5,
             allReadings6,
             allReadings7,
             allReadings8,
             allReadings9,
             allReadings10))

df = pd.DataFrame(L, index=allFeatures)
print (df)
        0     1     2     3     4     5     6     7     8     9
101  0.27  0.25  0.12  0.21  0.02  0.08  0.20  0.17  0.12  0.13
179  0.25  0.06  0.02  0.00  0.18  0.25  0.02  0.01  0.18  0.03
181  0.13  0.29  0.20  0.22  0.26  0.21  0.28  0.07  0.09  0.20
183  0.04  0.29  0.27  0.11  0.22  0.23  0.16  0.23  0.10  0.13
185  0.05  0.04  0.04  0.24  0.23  0.14  0.18  0.14  0.00  0.30
843  0.09  0.21  0.08  0.16  0.15  0.21  0.27  0.20  0.03  0.30
845  0.15  0.07  0.11  0.11  0.24  0.18  0.29  0.19  0.11  0.28
847  0.13  0.06  0.24  0.18  0.28  0.09  0.19  0.01  0.03  0.12
849  0.12  0.07  0.00  0.27  0.00  0.17  0.29  0.15  0.14  0.19
851  0.20  0.06  0.13  0.14  0.07  0.27  0.13  0.17  0.14  0.22

aoiFeatures = [181, 843, 849]
s = df.loc[aoiFeatures].max(axis=1)
print (s)
181    0.29
843    0.30
849    0.29
dtype: float64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • LMAO... Alas, I expose my struggle building that dataframe for your amusement: # L = np.transpose(list(zip(allReadings1,allReadings2,allReadings3,allReadings4,allReadings5,allReadings6,allReadings7,allReadings8,allReadings9,allReadings10))) 15 minutes of my life I'll never get back, ugh! – JackedUpDBA May 22 '18 at 12:24
  • 1
    @Jack you're lucky it was only 15 minutes. – piRSquared May 22 '18 at 12:50
1

Option 1

You can let Python's max do the work and use pandas.Series to hold the results

readings = [allReadings1, allReadings2, allReadings3, allReadings4, allReadings5,
            allReadings6, allReadings7, allReadings8, allReadings9, allReadings10]

s = pd.Series(dict(zip(allFeatures, map(max, zip(*readings)))))
s[aoiFeatures]

181    0.29
843    0.30
849    0.29
dtype: float64

Option 2

Or leverage Numpy

readings = [allReadings1, allReadings2, allReadings3, allReadings4, allReadings5,
            allReadings6, allReadings7, allReadings8, allReadings9, allReadings10]

s = pd.Series(np.max(readings, 0), allFeatures)
s[aoiFeatures]

181    0.29
843    0.30
849    0.29
dtype: float64

If you needed to update the array of maximums with a new reading

allReadings11 =[0.13, 0.03, 0.30, 0.13, 0.30, 0.30, 0.28, 0.12, 0.19, 0.22]
s[:] = np.maximum(s, allReadings11)

s[aoiFeatures]

181    0.29
843    0.30
849    0.29
dtype: float64
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • I'm wondering now which would be the fastest... I'm guessing Option #2. History: I found out how slow using pyodbc+cursor with a stored proc in SQL/Server is (32 minutes), so anything is an improvement at this point! I'll get the data (see FYI's) into all of these solutions and let you guys know. – JackedUpDBA May 22 '18 at 13:15
  • @Jack sounds like you may want to check https://stackoverflow.com/questions/71022/sql-max-of-multiple-columns and see if that offers anything you could use if your starting point is a SQL table... – Jon Clements May 22 '18 at 13:38
  • @Jon Clements - Thanks! Starting point is a collection of masked arrays. – JackedUpDBA May 22 '18 at 15:55
0

Very simple and quick task:

pd.DataFrame([allReadings1, allReadings2,...],columns=allFeatures).max()

Sample output:

enter image description here

A.Kot
  • 7,615
  • 2
  • 22
  • 24