Write function for .apply that analyzes column as a whole?

Question

I have a dataframe (it is the product of using the pivot function, which is why it has the c and a):

c 367235    419895  992194
a
1999-02-06  Nan 9   Nan
2000-04-03  2   Nan Nan
1999-04-12  Nan Nan 4
1999-08-08  2   Nan Nan
1999-11-01  8   5   1
1999-12-08  Nan 3   Nan
1999-08-17  Nan Nan 10
1999-10-22  Nan 3   Nan
1999-03-23  Nan 4   Nan
2000-09-29  9   Nan Nan
1999-04-30  Nan Nan 1
2000-09-30  9   Nan Nan

I would like to add a new row at the bottom of this dataframe. Each cell in the new row will evaluate the column above it; if the column contains the numbers 9, 8 or 3, the cell will evaluate to "TRUE". If the column does not contain those numbers, the cell will evaluate to "FALSE". Ultimately, my goal is to delete the columns with a "FALSE" cell using the drop function, creating a dataset like so:

c 367235    419895
a
1999-02-06  Nan 9
2000-04-03  2   Nan
1999-04-12  Nan Nan
1999-08-08  2   Nan 
1999-11-01  8   5
1999-12-08  Nan 3
1999-08-17  Nan Nan
1999-10-22  Nan 3   
1999-03-23  Nan 4
2000-09-29  9   Nan
1999-04-30  Nan Nan
2000-09-30  9   Nan
           TRUE TRUE

My problem:

I can write a function that evaluates if one of several numbers are in a list, but I cannot write this function into .apply.

That is, I found that this works for determining if a group of numbers is in a list:

How to check if one of the following items is in a list?

I tried to modify it as follows for the apply function:

def BIS(i):
        L1 = [9,8,3]
        if i in L1:
            return "TRUE"
    else:
        return "FALSE"
df_wanted.apply(BIS, axis = 0)

this results in an error: ('the truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item, a.any().' u'occured at index 367235')

This makes me think that although .apply takes an entire column as input, it cannot aggregate the truth value of all the individual cells and come up with a total truth value about the column. I looked up a.any and a.bool, and they look very useful, but I don't know where to stick them in? For example, this didn't work:

df_wanted.apply.any(BIS, axis = 0)

nor did this

df_wanted.apply(BIS.any, axis = 0).

Can anyone point me in the right direction? Many thanks in advance

Psidom · Accepted Answer · 2016-12-15T18:23:04.727

1

You can use the .isin() method:

df.loc[:, df.isin(['9','8','3']).any()]

And if you need to append the condition to the data frame:

cond = df.isin(['9','8','3']).any().rename("cond")
df.append(cond).loc[:, cond]

edited Dec 15 '16 at 18:23

answered Dec 15 '16 at 18:15

Psidom

209,562
33
339
356

that finished dataset looks beautiful - just what I want. And the logic seems right too, but I'm still pretty clueless about python and cannot recreate your results. When I put in your code, I get: "Empty Dataframe Columns: [] Index: [1999-02-06 00:00:00, ...[long list of dates]. Any idea what step I missed? – oymonk Dec 15 '16 at 18:27
It's possible in your original data frame the data types are numeric. Try to replace `['9','8','3']` with `[9,8,3]`. – Psidom Dec 15 '16 at 18:29

Write function for .apply that analyzes column as a whole?

1 Answers1