How to speedup this for loop in python

Question

I'm dealing with two sets of three large lists of the same size containing longitude, latitude and altitude coordinates in UTM format (see lists below). The arrays contain overlapping coordinates (i.e. longitude and latitude values are equal). If the values in Lon are equal to Lon2 and the values in Lat are equal to Lat2 then I want to calculate the mean altitude at those indexes. However, if they're not equal then the longitude, latitude and altitude values will remain. I only want to replace the overlapping data to one set of longitude and latitude coordinates and calculate the mean at those coordinates.

This is my attempt so far

 import numpy as np

 Lon = [450000.50,459000.50,460000,470000]
 Lat = [5800000.50,459000.50,500000,470000]
 Alt = [-1,-9,-2,1]
 Lon2 = [450000.50,459000.50,460000,470000]
 Lat2 = [5800000.50,459000.50,800000,470000]
 Alt2= [-3,-1,-20,2]

 MeanAlt = []
 appendAlt = MeanAlt.append
 LonOverlap = []
 appendLon = LonOverlap.append
 LatOverlap = []
 appendLat = LatOverlap.append

 for i, a in enumerate(Lon and Lat and Alt):
     for j, b in enumerate(Lon2 and Lat2 and Alt2):
         if Lon[i]==Lon2[j] and Lat[i]==Lat2[j]:
             MeanAltData = (Alt[i]+Alt2[j])/2
             appendAlt(MeanAltData)
             LonOverlapData = Lon[i]
             appendLat(LonOverlapData)
             LatOverlapData = Lat[i]
             appendLon(LatOverlapData)

 print(MeanAlt) # correct ans should be MeanAlt = [-2.0,-5,1.5]
 print(LonOverlap)
 print(LatOverlap)

I'm working in a jupyter notebook and my laptop is rather slow so I need to make this code much more efficient. I would appreciate any help on this. Thank you :)

Why third `MeanAlt` is -5? `Lat[2]!=Lat2[2]` and so, according to your problem formulation _"...if they're not equal then the longitude, latitude and **altitude values will remain**"_. What does _"altitude values will remain"_ mean and how did this affect `MeanAlt[2]`? — AGN Gazer, Jul 21 '17 at 15:59
Ah, it seems that you are *dropping values* corresponding to unequal Lon or Lat. Please confirm. — AGN Gazer, Jul 21 '17 at 16:01
Why do you have `import numpy as np` at the beginning and you never use `np` anywhere in your code? — AGN Gazer, Jul 21 '17 at 18:08

P. Shark · Answer 1 · 2017-07-21T14:44:35.740

I believe your code can be improved in 2 ways:

Firstly, the usage of tuples instead of lists, as iterating over a tuple is generally faster than iterating over a list.
Secondly, your for loops can be reduced to only one loop that iterates over the indices of the tuples you are going to read. Of course, this assumption holds if and only if all your tuples contain the same amount of items (i.e.: len(Lat) == len(Lon) == len(Alt) == len(Lat2) == len(Lon2) == len(Alt2)).

Here is the improved code (I took the liberty of removing the import numpy statement as it was not being used in the piece of code you provided):

# use of tuples
Lon = (450000.50, 459000.50, 460000, 470000)
Lat = (5800000.50, 459000.50, 500000, 470000)
Alt = (-1, -9, -2, 1)
Lon2 = (40000.50, 459000.50, 460000, 470000)
Lat2 = (5800000.50, 459000.50, 800000, 470000)
Alt2 = (-3, -1, -20, 2)

MeanAlt = []
appendAlt = MeanAlt.append
LonOverlap = []
appendLon = LonOverlap.append
LatOverlap = []
appendLat = LatOverlap.append

# only one loop
for i in range(len(Lon)):
    if (Lon[i] == Lon2[i]) and (Lat[i] == Lat2[i]):
        MeanAltData = (Alt[i] + Alt2[i]) / 2
        appendAlt(MeanAltData)
        LonOverlapData = Lon[i]
        appendLat(LonOverlapData)
        LatOverlapData = Lat[i]
        appendLon(LatOverlapData)

print(MeanAlt)  # correct ans should be MeanAlt = [-2.0,-5,1.5]
print(LonOverlap)
print(LatOverlap)

I executed this program 1 million times on my laptop. Following my code, the amount of time required for all executions is: 1.41 seconds. On the other hand, with your approach the amount of time it takes is: 4.01 seconds.

Also, I believe there is a mistake in your expected result of `MeanAlt`, since it should contain **only 2** elements as the only items whose coordinates match are the _second_ and the _fourth_. — P. Shark, Jul 21 '17 at 14:52
Your code only works under the (unlikely) assumption that the overlapping lat/lon are guaranteed to be at the same position in both lists. — knipknap, Jul 21 '17 at 15:06
Hi P.Shark, thanks for answering. You're right, it was my mistake in the first value I entered for Lon2. You're also right in assuming all lat/lon are in the same position. Thank you! — TimeExplorer, Jul 21 '17 at 16:00
@TimeExplorer glad I was of help. While it is true that my solution only works in the scenario described by **knipknap**, from the way your code was written I assumed that you were only interested comparing elements in the same position. In any case, for even greater performance (and if your hardware supports it) you could consider taking advantage of the `multiprocessing` package ([link](https://docs.python.org/2/library/multiprocessing.html)) — P. Shark, Jul 21 '17 at 20:38

score 1 · Answer 2 · answered Jul 21 '17 at 15:04

This is not 100% functionally equivalent, but I am guessing it is closer to what you actually want:

Lon = [450000.50,459000.50,460000,470000]
Lat = [5800000.50,459000.50,500000,470000]
Alt = [-1,-9,-2,1]
Lon2 = [40000.50,459000.50,460000,470000]
Lat2 = [5800000.50,459000.50,800000,470000]
Alt2= [-3,-1,-20,2]

MeanAlt = []
appendAlt = MeanAlt.append
LonOverlap = []
appendLon = LonOverlap.append
LatOverlap = []
appendLat = LatOverlap.append

ll = dict((str(la)+'/'+str(lo), al) for (la, lo, al) in zip(Lat, Lon, Alt))

for la, lo, al in zip(Lon2, Lat2, Alt2):
    al2 = ll.get(str(la)+'/'+str(lo))
    if al2:
        MeanAltData = (al+al2)/2
        appendAlt(MeanAltData)
        LonOverlapData = lo
        appendLat(LonOverlapData)
        LatOverlapData = la
        appendLon(LatOverlapData)

print(MeanAlt) # correct ans should be MeanAlt = [-2.0,-5,1.5]
print(LonOverlap)
print(LatOverlap)

Or simpler:

Lon = [450000.50,459000.50,460000,470000]
Lat = [5800000.50,459000.50,500000,470000]
Alt = [-1,-9,-2,1]

Lon2 = [40000.50,459000.50,460000,470000]
Lat2 = [5800000.50,459000.50,800000,470000]
Alt2= [-3,-1,-20,2]

ll = dict((str(la)+'/'+str(lo), al) for (la, lo, al) in zip(Lat, Lon, Alt))

result = []
for la, lo, al in zip(Lon2, Lat2, Alt2):
    al2 = ll.get(str(la)+'/'+str(lo))
    if al2:
        result.append((la, lo, (al+al2)/2))

print(result)

In practice, I would try to start with better structured input data to begin with, making the conversion to dict, or at the very least the "zip()" unnecessary.

Hi, thanks for answering. I like how you arranged the data at the end. However it only extracts the coordinates when both Lon and Lat are equal. I need the mean to be calculated when Lon and Lon2 are equal but can be different numbers to Lat and Lat2 which should be equal. For example: Lon = [450000.50] Lon2 = [450000.50] Lat= [5800000.50] Lat2=[5800000.50]. I'm not sure how to implement this in the code you've given hehe. Also, if you have suggestions better way of arranging the input data, please let me know. I only set it up in lists because I'm most familiar with them. — TimeExplorer, Jul 21 '17 at 15:45

AGN Gazer · Answer 3 · 2017-07-21T17:59:12.403

Use numpy to vectorize computations. For 1,000,000 long arrays execution time should be on the order of 15-25ms of microseconds if inputs are already numpy.ndarrays and ~140ms if inputs are Python lists.

import numpy as np
def mean_alt(lon, lon2, lat, lat2, alt, alt2):
    lon = np.asarray(lon)
    lon2 = np.asarray(lon2)
    lat = np.asarray(lat)
    lat2 = np.asarray(lat2)
    alt = np.asarray(alt)
    alt2 = np.asarray(alt2)
    ind = np.where((lon == lon2) & (lat == lat2))
    mean_alt = (0.5 * (alt[ind] + alt2[ind])).tolist()
    return (lon[ind].tolist(), lat[ind].tolist(), mean_alt)

How to speedup this for loop in python

3 Answers3