0

I am having trouble trying to obtain a boolean array that indicates when an element in the second numpy array is also in the first array.

The challenging part is that each array is made up of Latitude/Longitude pairs and I want to make sure that each Lat/Lon in secondcoords is also in firstcoords. So, this is like an intersection

Here is what I have done thus far (with small example coordinates):

firstlat = [0, 1, 5, 5]
firstlon = [1, 0, 5, 4]

secondlat = [0, 2, 0, 5]
secondlon = [1, 2, 5, 5]

firstcoords = numpy.array((firstlat, firstlon))
firstcoords = numpy.transpose(firstcoords) # gets lat/lon pair


secondcoords = numpy.array((secondlat, secondlon))
secondcoords = numpy.transpose(secondcoords)

a = numpy.isin(secondcoords, firstcoords)

Wrong output:
[[ True  True]
[False False]
[ True  True]
[ True  True]]

Wanted output: [[True, False, False, True]]

Numpy isin flattens the arguments so although firstcoords[0] = [0 1], it seems to be improperly comparing it "element by element". However, as I saw, each element comprises of both the [lat lon]; and the purpose to transpose it was to get the lat / lons in tuple or tuple-like form for easier comparison. So, how do I fix my approach or what other approaches would be feasible for this problem?

Neuron
  • 5,141
  • 5
  • 38
  • 59
Hamza
  • 1

4 Answers4

0

If you zip the latitudes and longitudes up into tuples:

firstlat = [0, 1, 5, 5]
firstlon = [1, 0, 5, 4]

secondlat = [0, 2, 0, 5]
secondlon = [1, 2, 5, 5]

first_lat_lon = list(zip(firstlat,firstlon))
second_lat_lon = list(zip(secondlat,secondlon))

Then you can easily check which of the second list are in the first:

[x in first_lat_lon for x in second_lat_lon]

Which returns:

[True, False, False, True]
Lev Zakharov
  • 2,409
  • 1
  • 10
  • 24
T Burgis
  • 1,395
  • 7
  • 9
  • I'm sorry, i forgot to mention. I tried zip already and it takes insanely long given the size of my dataset – Hamza Aug 08 '18 at 22:50
  • Where does the data come from? Is it not possible to create the tuples directly from the source? – T Burgis Aug 08 '18 at 22:54
  • No the source just provides the 2 sets of lats/lons, analagous to those in the example. – Hamza Aug 08 '18 at 23:23
0

I don't know that the functionality you're looking for is possible in numpy. I recommend using the following:

in_second_and_first = set(zip(secondlat,secondlon)) & set(zip(firstlat,firstlon))

If you're using Python 2 (which I would strongly recommend against), use itertools.izip instead of the built-in zip.

PMende
  • 5,171
  • 2
  • 19
  • 26
0

One hack to make use of isin is to use structured arrays though your arrays not really 1-dimensional.

firstlat = [0, 1, 5, 5]
firstlon = [1, 0, 5, 4]

secondlat = [0, 2, 0, 5]
secondlon = [1, 2, 5, 5]

firstcoords = np.array(list(zip(firstlat, firstlon)), dtype=[("lat", int), ("lon", int)])
# array([(0, 1), (1, 0), (5, 5), (5, 4)], dtype=[('lat', '<i8'), ('lon', '<i8')])
secondcoords = np.array(list(zip(secondlat, secondlon)), dtype=[("lat", int), ("lon", int)])
# array([(0, 1), (2, 2), (0, 5), (5, 5)], dtype=[('lat', '<i8'), ('lon', '<i8')])

np.isin(secondcoords, firstcoords)
# array([ True, False, False,  True])

Reference

How to make Numpy treat each row/tensor as a value

Get intersecting rows across two 2D numpy arrays

Tai
  • 7,684
  • 3
  • 29
  • 49
  • Tai, this is working fastest amongst the proposed answers, yet it is still slower than wanted. Do you see a more efficient way using structured arrays? – Hamza Aug 09 '18 at 03:26
  • @Hamza I have no clue for now. – Tai Aug 09 '18 at 03:42
0

Use a np.void view. This just looks at every row as a chunk of data instead of discrete values.

def vview(a):  #based on @jaime's answer: https://stackoverflow.com/a/16973510/4427777
    return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))

Then you can just use np.isin just as you wanted to

a = numpy.isin(vview(secondcoords), vview(firstcoords))

Beware this is a comparison on the data level, so there is no way to deal with floating point inaccuracies if your actual data is floats. On the other hand, it is extremely fast as it doesn't require restructuring or copying your data in any way.

Daniel F
  • 13,620
  • 2
  • 29
  • 55