0

I have dataframes that can contain a mix of booleans and integers, and I'd like to be able to do things like df_1 == df_2.loc[0,0], and guarantee that if df_2.loc[0,0] is 1 that it won't match True values in df_1.

OCa
  • 298
  • 2
  • 13
user6118986
  • 341
  • 2
  • 15
  • 1
    Could you elaborate on what you mean by "I have dataframes that can contain a mix of booleans and integers"? What dtypes? object? – Brian61354270 Jul 28 '23 at 00:01
  • The dataframes are formed from csv data which will contain strings, ints, floats, bools, etc in unknown columns - so the dtypes aren't really known, it's whatever `read_csv` decides – user6118986 Jul 28 '23 at 00:04
  • 1
    A single column is always a single datatype. You can't mix booleans and ints within a single column. – Tim Roberts Jul 28 '23 at 00:14
  • Dataframes _do_ have fixed dtypes. What are the dtypes of the dataframes you're working with? – Brian61354270 Jul 28 '23 at 00:14
  • 2
    Perhaps you should show us an example of the data you're using. – Tim Roberts Jul 28 '23 at 00:17
  • 2
    Just as a note, sometimes a data cleaning step is useful - i.e., converting `0,1,'yes','no','true','false','off','on','oui','non'` and all your other mixed up values to clean booleans. – topsail Jul 28 '23 at 00:20
  • 3
    @Brian61354270 I don't know the OP's data, but you can easily create this with `df = pd.DataFrame({'col': [True, 1, False, 0]})`. The dtype is `dtype('O')`. – Barmar Jul 28 '23 at 00:44
  • 1
    @TimRoberts See my above comment. – Barmar Jul 28 '23 at 00:46
  • 1
    If you print the df, it shows `True` and `1` in the first two rows. But `df['col'] == 1` is True in both rows. – Barmar Jul 28 '23 at 00:47
  • 1
    @OCa [Please don't change code functionality/conventions in the question.](//meta.stackoverflow.com/q/260245/4518341) – wjandrea Jul 28 '23 at 16:14
  • 1
    Are you talking about bool and int in one column, like Barmar showed, or some bool columns, some int columns? An example would help a lot; check out [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Jul 28 '23 at 16:53
  • got your answer now i believe. Interesting question! I suppose the downvoting happened because you failed to add an input dataframe and a desired output. – OCa Jul 29 '23 at 11:25

2 Answers2

1

Pre-processing your data to avoid collisions between varied datatypes is better practice. But assuming you cannot separate integers from booleans in your dataframes, then enhance == with boolean detection:

def BoolProofCompare(a, b):
    '''Override default True == 1, False == 0 behavior'''
    return a==b and isinstance(a, bool)==isinstance(b, bool)

BoolProofCompare(1, True)  # False
BoolProofCompare(0, False)  # False
BoolProofCompare(1, 1)  # True
BoolProofCompare(False, False)  # True
# and so on and so forth

Now, I gather that what you request is cell by cell comparison of a single value, e.g. df_2[0][0], with each element in a dataframe, e.g. df_1, with True==1 and False==0 equalities disabled. In that case, use applymap to broadcast the above comparison to every cell:

# my example of input dataframe
df
    col1  col2
0   True     1
1      1     2
2  False     3
3      0     4

df.applymap(lambda x : BoolProofCompare(x, True))
    col1   col2
0   True  False
1  False  False
2  False  False
3  False  False

df.applymap(lambda x : BoolProofCompare(x, False))
    col1   col2
0  False  False
1  False  False
2   True  False
3  False  False

df.applymap(lambda x : BoolProofCompare(x, 1))
    col1   col2
0  False   True
1   True  False
2  False  False
3  False  False

df.applymap(lambda x : BoolProofCompare(x, 0))
    col1   col2
0  False  False
1  False  False
2  False  False
3   True  False

I suppose it would be more convenient to encapsulate the enhanced comparison inside a new function, like this:

def BoolProofCompare_df(df, a):
    '''
    Compare single value *a* with dataframe *df*, cell by cell, 
    with True==1 and False==0 equalities disabled.
    '''
    return df.applymap(lambda x : BoolProofCompare(x, a))
OCa
  • 298
  • 2
  • 13
0

See @OCa's answer for the BoolProofCompare function. An alternative implementation which also makes 0 (int) different from 0.0 (float):

def BoolProofCompare(a, b):
    return a == b and type(a) == type(b)

The reason why just return a == b doesn't work is that in Python True == 1 and True == 1.0.

pts
  • 80,836
  • 20
  • 110
  • 183