-1

ndarray of different shapes and the bigger one has a shape of (22470,2) and it looks like this

df1

array([[-0.39911392,  0.46759156],
       [ 0.28343494,  0.88479157],
       [-0.0114085 , -1.23768313],
       ...,
       [-0.35930586,  0.54784439],
       [-0.37994004,  0.51332771],
       [-0.36309593,  0.49318486]])

and the small one which represents the outliers of df1 array and its shape is (675,2) and it looks like this

df2
array([[-0.04450032,  0.31053589],
       [-0.4320086 ,  0.14815988],
       [-0.07948631, -1.32638555],
       ...,
       [-0.32619787,  0.34910699],
       [-0.50870225, -0.230849  ],
       [-0.43532727,  0.49763502]])

so tried to subtract both of them to have a new array that contains everything in df1 except df2 but it gives me this error

ValueError: operands could not be broadcast together with shapes (22470,2) (675,2)

How can I do it in Python?

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
noob
  • 672
  • 10
  • 28
  • see https://stackoverflow.com/q/32832923/6692898 – RichieV Sep 02 '20 at 21:24
  • what is your expected output? I don't understand how your substraction should work – RichieV Sep 02 '20 at 21:25
  • @RichieV for example if df1=[ [1,2],[2,2],[3,6]] and df2=[2,2] if I subtract both of them or I don't know if substraction is the right term I would like the result be like this dfnew=[1,2],[3,6]] – noob Sep 02 '20 at 21:30
  • 1
    check this [answer](https://stackoverflow.com/a/53645883/6692898) find `LEFT-Excluding` and you should be set – RichieV Sep 02 '20 at 21:38
  • What function or operator were you using? `df1-df2` or something else? – hpaulj Sep 02 '20 at 22:48
  • What's the criteria for keeping or removing elements? I assume you want some how to match whole rows (2 numbers)? Keep in mind that floats rarely match exactly. – hpaulj Sep 02 '20 at 23:44

1 Answers1

1

"Subtracting" two arrays does not perform set operations on the arrays, it simply subtracts the values of one from the values of the other (i.e. 4 - 3 => 1).

What you want to do is basically a set operation. There is no simple straightforward way to do what you want, how you have presented it (but that doesn't mean it can't be done). Comparing floating-point numbers for exact equality is a bad idea, instead you will find it much more useful to collect an array of indices of the outliers rather than their values. Then you can index your array like this question.

So this would then be something like

df1 = array([[1.234, 2.345], [3.3452, 2.456], [5.234, 7.453]])

# This is an array of indices, not float values.
df2 = array([1])

keep = np.ones(len(df1), dtype=bool)
keep[df2] = 0
newdf = df1[keep]

# newdf: [[1.234, 2.345], [5.234, 7.453]]
lxop
  • 7,596
  • 3
  • 27
  • 42
  • sorry subtraction was rong term to use I don't want to subtract both array I won't a new array that contains everything except the df2 array for example if df1=array([ [1,2],[2,2],[3,6]]) and df2=array([2,2]) the newdf would be new=array([1,2],[3,6]) – noob Sep 02 '20 at 21:34
  • 1
    Yes I understand that, and that is what I have answered. I will update the answer to be clearer – lxop Sep 02 '20 at 21:35