8

Is there an efficient way of creating a 2D array of the values from unsorted coordinate points (i.e. not all lons and/or lats are ascending or descending) without using loops?

Example Data

lats = np.array([45.5,45.5,45.5,65.3,65.3,65.3,43.2,43.2,43.2,65.3])
lons = np.array([102.5,5.5,116.2,102.5,5.5,116.2,102.5,5.5,116.2,100])
vals = np.array([3,4,5,6,7,7,9,1,0,4])

Example Output
Each column represents a unique longitude (102.5, 5.5, 116.2, & 100) and each column represents a unique latitude (45.5,65.3, & 43.2).

([ 3, 4, 5, NaN],
 [ 6, 7, 7, 4],
 [ 9, 1, 0, NaN])

Though, it isn't so straight forward because I don't necessarily know how many duplicates of each lon or lat there are which determines the shape of the array.

Update:
I had the data arranged incorrectly for my question. I have arranged it now, so they are all unique pairs and there is an additional data point to demonstrate how the data should be arranged when NaNs are present.

ryanjdillon
  • 17,658
  • 9
  • 85
  • 110
  • What dictates the size of the output array? The number of non-duplicating values in `lats` and `lons`? – danodonovan Feb 27 '13 at 18:16
  • That's right... I think :) – ryanjdillon Feb 27 '13 at 18:31
  • 1
    Can you explain in words the specifications that make the example output the desired answer? What is the logic that indicates a 100 should be placed in the output when 100 is not a value in `vals`? and why there? – unutbu Mar 01 '13 at 11:07
  • That was just a mistake my apologies. It should be one additional value that I did not put in the value array. Correcting that now. – ryanjdillon Mar 01 '13 at 15:47

3 Answers3

5

The example you have posted makes very little sense, and it doesn't allow any reasonable way to specify missing data. I am guessing here, but the only reasonable thing you may be dealing with seems to be something like this :

>>> lats = np.array([43.2, 43.2, 43.2, 45.5, 45.5, 45.5, 65.3, 65.3, 65.3])
>>> lons = np.array([5.5, 102.5, 116.2, 5.5, 102.5, 116.2, 5.5, 102.5, 116.2])
>>> vals = np.array([3, 4, 5, 6, 7, 7, 9, 1, 0])

Where the value in vals[j] comes from latitude lats[j] and longitude lons[j], but the data may come scrambled, as in :

>>> indices = np.arange(9)
>>> np.random.shuffle(indices)
>>> lats = lats[indices]
>>> lons = lons[indices]
>>> vals = vals[indices]
>>> lats
array([ 45.5,  43.2,  65.3,  45.5,  43.2,  65.3,  45.5,  65.3,  43.2])
>>> lons
array([   5.5,  116.2,  102.5,  116.2,    5.5,  116.2,  102.5,    5.5,  102.5])
>>> vals
array([6, 5, 1, 7, 3, 0, 7, 9, 4])

You can get this arranged into an array as follows:

>>> lat_vals, lat_idx = np.unique(lats, return_inverse=True)
>>> lon_vals, lon_idx = np.unique(lons, return_inverse=True)
>>> vals_array = np.empty(lat_vals.shape + lon_vals.shape)
>>> vals_array.fill(np.nan) # or whatever yor desired missing data flag is
>>> vals_array[lat_idx, lon_idx] = vals
>>> vals_array
array([[ 3.,  4.,  5.],
       [ 6.,  7.,  7.],
       [ 9.,  1.,  0.]])
Jaime
  • 65,696
  • 17
  • 124
  • 159
  • Thanks Jaime. This is also an excellent answer and is very helpful. My apologies for the poor example. I find it difficult sometimes to refine my question without adding unnecessary bits to confuse things. – ryanjdillon Feb 28 '13 at 10:28
  • I think I get where things weren't making sense. The lat/lon/value data should be unique pairs and are all consistent in their ordering; though, no one list is in strictly ascending or descending order. I've reorded things correctly (so they are unique) and added a value to demonstrate how the output should be when NaNs are present. Thanks for the help! – ryanjdillon Mar 01 '13 at 10:19
  • @shootingstars Your edited sample input is still not consistent with your expected output. But I am more convinced now that what I propose above is what you want. Try it on your sample input (after appending a `100` to the `vals` array!), see what you get, understand why it's different from what you were expecting, and I think you will eventually realize the above method is the way to go. – Jaime Mar 01 '13 at 14:35
  • Yep, that works fantastically, and it seems a bit clearer than using views. The `100` in my output array and lack of a new value in my `vals` array was a paper to keyboard error :) Sorry my misunderstandings and typos, and thanks again for the help! This was enlightening. – ryanjdillon Mar 01 '13 at 16:24
  • Thank you Jaime, I've spent hours looking for a solution like this! – balu Nov 12 '14 at 14:36
  • I am using this trick to transform points in (x,y,z) format into something that I can plot with `matplotlib.pyplot.contour` and it seems to me that `vals_array` comes out transposed. So `contour(lat_vals, lon_vals, vals_array)` cannot be plotted whereas `contour(lat_vals, lon_vals, vals_array.transpose())` can. Clearly I am doing a case where lon and lat have different number of unique values. [[ I am very puzzled by this behaviour, possibly I am doing some mistake]] – Rho Phi Aug 14 '16 at 14:16
1

If you're creating a 2D array, then all arrays will have to have the same number of points. If this is true, you can simply do

out = np.vstack((lats, lons, vals))

edit

I think this might be what you're after, it matches your question at least :)

xsize = len(np.unique(lats))
ysize = len(np.unique(lons))

and then if your data is very well behaved

out = [vals[i] for i, (x, y) in enumerate(zip(lats, lons))]
out = np.asarray(out).reshape((xsize, ysize))
Community
  • 1
  • 1
danodonovan
  • 19,636
  • 10
  • 70
  • 78
  • I think I phrased it poorly, but I would like to end up with an array of dimension (len(lats), len(lon)) containing only the values for their respective coordinates. – ryanjdillon Feb 27 '13 at 18:01
  • But `lats` and `lons` aren't integer values - so they won't fit neatly into a grid of size `(max(lats), max(lon))` have I missed something? – danodonovan Feb 27 '13 at 18:05
  • sorry, these are lists, so it would be the length of the lists (i.e. integer number of elements), but I just realized that what I really want is len(lats)/number of duplicate lats, etc. See my update to the question. – ryanjdillon Feb 27 '13 at 18:11
1
import numpy as np

lats = np.array([45.5,45.5,45.5,65.3,65.3,65.3,43.2,43.2,43.2,65.3])
lons = np.array([102.5,5.5,116.2,102.5,5.5,116.2,102.5,5.5,116.2,100])
vals = np.array([3,4,5,6,7,7,9,1,0,4])


def unique_order(seq): 
    # http://www.peterbe.com/plog/uniqifiers-benchmark (Dave Kirby)
    # Order preserving
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

unique_lats, idx_lats = np.unique(lats, return_inverse=True)
unique_lons, idx_lons = np.unique(lons, return_inverse=True)
perm_lats = np.argsort(unique_order(lats))
perm_lons = np.argsort(unique_order(lons))

result = np.empty((len(unique_lats), len(unique_lons)))
result.fill(np.nan)
result[perm_lats[idx_lats], perm_lons[idx_lons]] = vals
print(result)

yields

[[  3.   4.   5.  nan]
 [  6.   7.   7.   4.]
 [  9.   1.   0.  nan]]
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • This looks great, but I keep getting a `ValueError: total size of new array must be unchanged`. I'm guessing I am mixing something up somewhere as this and danodonovan's answer are both pretty straight forward. – ryanjdillon Feb 27 '13 at 18:50
  • Also, you are right on the output. And the error I'm experiencing is when I am using this with my actual dataset/script, not the example. – ryanjdillon Feb 27 '13 at 18:52
  • 1
    The `ValueError` is saying that `len(vals)` does not equal `len(np.unique(lats)) * len(np.unique(lons))`. If `len(vals)` is too long, do you want to truncate `vals`? and if `len(vals)` is too short, do you want to fill the rest of the array with `0`s? There are lots of other possibilities too... – unutbu Feb 27 '13 at 18:55
  • Ah... I see that my lats have fewer unique than the lons (all are unique). I suppose that I would like to fill the rest with missing values such as NaN or -9999, or whatever is appropriate. – ryanjdillon Feb 27 '13 at 19:07
  • Do you happen to have a suggested method for filling gaps? It appears [reshape](http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html) doesn't support this directly. – ryanjdillon Feb 27 '13 at 19:32
  • After taking a second look at things, it appears this doesn't actually function as I thought. I have added an update to my question to explain. Thanks again for the help! – ryanjdillon Mar 01 '13 at 10:15