0

I feel I'm making this harder than it should be: what I have is a dataframe with some columns whose entries each contain numpy arrays (the names of the columns containing these arrays is in an array called names_of_cols_that_contain_arrays). What I want to do is filter out rows for which these numpy arrays have a sum value of zero. This is a similar question on which my code is based, but it doesn't seem to work with the iterator over rows in each column.

What I have currently in my code is

for col_name in names_of_cols_that_contain_arrays:
  for i in range(len(df[col_name])):
    df = df[df[col_name][i].sum() > 0.0]

which doesn't seem that efficient but is a first attempt that explictly goes through what I thought would be the correct method. But this appears to return a boolean, i.e.

Traceback
...
KeyError: True

In fact in most cases to the code above I get some error associated with a boolean being returned. Any pointers would be appreciated, thanks in advance!

1 Answers1

1

IIUC:

You can try:

df=df.loc[df['names_of_cols_that_contain_arrays'].map(sum)>0]
#OR
df=df.loc[df['names_of_cols_that_contain_arrays'].map(np.sum).gt(0)]

Sample dataframe used:

from numpy import array

d={'names_of_cols_that_contain_arrays': {0: array([-1,  0, -8]),
  1: array([-1, -2,  5])}}

df=pd.DataFrame(d)
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • 1
    Ah yes, this has fixed the issue! (just with the `for col_name in names_of_cols_that_contain_arrays` added back in), I didn't realise this `.map` function was a thing until recently, didn't appreciate it until now, think I need to do more investigating of it. – GluonicPenguin Aug 11 '21 at 12:03