2

I have a DataFrame which stores a 2D array as a first column and a 1D vector with three element as a second column:

import numpy as np, pandas as pd

A = pd.DataFrame(
    {
      'array': [np.array([[1,2,3],[4,5,6]]), np.array([[7,8,9],[10,11,12]])],
      'vector': [np.array([0.19,0.11,-0.2]), np.array([0.12,0.27,0.4])],
    }, index=['top','bottom'])

I would like to multiply the whole arrays by the sign of last value of the vectors so that I transform the original array from:

                            array              vector
top        [[1, 2, 3], [4, 5, 6]]  [0.19, 0.11, -0.2]
bottom  [[7, 8, 9], [10, 11, 12]]   [0.12, 0.27, 0.4]

to this one:

                                           array              vector
top     [[-1.0, -2.0, -3.0], [-4.0, -5.0, -6.0]]  [0.19, 0.11, -0.2]
bottom     [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]   [0.12, 0.27, 0.4]

What I've tried:

A['array'] /= np.sign(A['vector'][2])

but I have a KeyError: 2 (probably because that number 2 is used to access the rows, not the values inside my vectors).
And trying to access the vector data using this is not working either:

A['vector'][:,2]
KeyError: 'key of type tuple not found and not a MultiIndex'

So, is it to possible to achieve that with a natural,simple and close-to-numpy-style vector operation (i.e. not using .apply())?
Because this is working but seems overkill for what it does (but I can actually live with it):

 A['array'] /= A['vector'].apply(lambda x: np.sign(x[2]))

%timeit: 630 µs ± 34.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

and which also strangely didn't raise the famous error: "A value is trying to be set on a copy of a slice from a DataFrame".
Which I was actually expecting...

All this serves as an example; I may have more linear algebra operations to apply between some arrays stored as columns in my DataFrame.

I also know I can stay out of Pandas' world, staying only within NumPy's, but sometimes, I appreciate having the columns headers and rows indices of a dataframe acting as some human readable pointers, especially when dealing with complex arrays. And working with dictionaries could also be fine but they may not be as suited as using Pandas(?).

swiss_knight
  • 5,787
  • 8
  • 50
  • 92
  • hi! Is any one of the answers below working? If so & if you wish, you might consider [accepting](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work/5235#5235) one of them to signal others that the issue is resolved. If not, you can provide feedback so they can be improved (or removed altogether) – Anurag Dabas Jul 27 '21 at 07:01

2 Answers2

2

Try via str.get() and np.sign():

A['array']=A['array']*np.sign(A['vector'].str.get(-1))

OR

Try via np.vstack() and np.sign():

A['array']=A['array']*np.sign(np.vstack(A['vector'].values)[:,-1])
#you can also use np.stack() in place of np.vstack()

OR

Try via mul() and map():

A['array']=A['array'].mul(A['vector'].map(lambda x:-1 if x[-1]<0 else 1))

output of df:

                                          array              vector
top     [[-1.0, -2.0, -3.0], [-4.0, -5.0, -6.0]]  [0.19, 0.11, -0.2]
bottom     [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]   [0.12, 0.27, 0.4]
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
1

Despite its name, str accessor can be used for indexing the arrays here. This can be attributed to duck-typing: what str[-1] does is to take last item of each entry and entries happen to support indexing.

So

A.array /= np.sign(A.vector.str[-1])

gives

>>> A
                                           array              vector
top     [[-1.0, -2.0, -3.0], [-4.0, -5.0, -6.0]]  [0.19, 0.11, -0.2]
bottom     [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]   [0.12, 0.27, 0.4]
Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38