1

I am working on some molecular dynamics using Python, and the arrays tend to get pretty large. It would be helpful to have a quick check to see if certain vectors appear in the arrays. After searching for way to do this, I was surprised to see this question doesn't seem to come up. In particular, if I have something like

import numpy as np
y = [[1,2,3], [1,3,2]]
x = np.array([[1,2,3],[3,2,1],[2,3,1],[10,5,6]])

and I want to see if the specific vectors from y are present in x (not just the elements), how would I do so? Using something like

for i in  y:
    if i in x:
        print(i)

will simply return every y array vector that contains at least one element of i. Thoughts?

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • If the number of vectors can get large and you want it to be quick I would look into using a hash table. Need a hash function for your vectors and a dictionary. – Kurt Stutsman Mar 04 '17 at 00:59
  • But lists and arrays aren't hashable! – hpaulj Mar 04 '17 at 01:07
  • So what result do you expect from your x and y? Does order matter? What's the typical shape of `x`; len of `y`? – hpaulj Mar 04 '17 at 01:11
  • The problem of finding if rows of one array are present in another is related to the question of finding unique rows or duplicates in an array. `numpy` `unique` and `in1d` work with 1d arrays, but not 2d. The work around is to transform 2d array into a 1d one with whole-row elements For example http://stackoverflow.com/questions/16970982/find-unique-rows-in-numpy-array – hpaulj Mar 04 '17 at 05:10

3 Answers3

1

If you want to check if ALL vectors in y are present in the array, you could try:

import numpy as np
y = [[1,2,3], [1,3,2]]
x = np.array([[1,2,3],[3,2,1],[2,3,1],[10,5,6]])

all(True if i in x else False for i in y)
# True
ODiogoSilva
  • 2,394
  • 1
  • 19
  • 20
0

You don't explicitly give your expected output, but I infer that you want to see only [1, 2, 3] as the output from this program.

You get that output if you make x merely another list, rather than a NumPy array.

Prune
  • 76,765
  • 14
  • 60
  • 81
0

The best strategy will depend on sizes and numbers. A quick solution is

[np.where(np.all(x==row, axis=-1))[0] for row in y]
# [array([0]), array([], dtype=int64)]

The result list gives for each row in y a possibly empty array of positions in x where the row occurs.

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99