6

Consider df:

In [2098]: df = pd.DataFrame({'a': [1,2], 'b':[3,4]})

In [2099]: df
Out[2099]: 
   a  b
0  1  3
1  2  4

Now, I try to append a list of values to df:

In [2102]: df.loc[2] = [3, 4]

In [2103]: df
Out[2103]: 
   a  b
0  1  3
1  2  4
2  3  4

All's good so far.

But now when I try to append a row with list of boolean values, it converts it into int:

In [2104]: df.loc[3] = [True, False]

In [2105]: df
Out[2105]: 
   a  b
0  1  3
1  2  4
2  3  4
3  1  0

I know I can convert my df into str and can then append boolean values, like:

In [2131]: df = df.astype(str)
In [2133]: df.loc[3] = [True, False]

In [2134]: df
Out[2134]: 
      a      b
0     1      3
1     2      4
3  True  False

But, I want to know the reason behind this behaviour. Why is it not automatically changing the dtypes of columns to object when I append boolean to it?

My Pandas version is:

In [2150]: pd.__version__
Out[2150]: '1.1.0'
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • `'1.1.0'` is my pandas version. – Mayank Porwal Dec 23 '20 at 08:07
  • The boolean are ints (at least in standard Python). If you do df.loc[2] = ['3', '4'], it will change it to object. – Dani Mesejo Dec 23 '20 at 08:08
  • @DaniMesejo Yes, I know if I append list of string values, dtypes changes to `object`. Not sure why this behaviour is not replicated with `boolean`. – Mayank Porwal Dec 23 '20 at 08:11
  • 3
    In my opinion mixing types are not recommended, so it should working buggy. Same problem if use `df.append(pd.Series([True, False], index=['a','b']), ignore_index=True )` – jezrael Dec 23 '20 at 08:13
  • 1
    @MayankPorwal Because as I said in Python (not sure if pandas and numpy do the same) the boolean are a subclass of integers: https://docs.python.org/3/reference/datamodel.html#index-10 – Dani Mesejo Dec 23 '20 at 08:13
  • Also this: https://stackoverflow.com/questions/2764017/is-false-0-and-true-1-an-implementation-detail-or-is-it-guaranteed-by-the – Dani Mesejo Dec 23 '20 at 08:14
  • 1
    Agreed with Dani , since python booleans are binary ints: `1+True` returns `2`, same way it returns a binary int when you add `True` and `False` – anky Dec 23 '20 at 08:18
  • 1
    @jezrael Yes, same problem with series also. – Mayank Porwal Dec 23 '20 at 08:20

2 Answers2

3

Why is it not automatically changing the dtypes of columns to object when I append boolean to it?

Because the type are being upcasted (see upcasting), from the documentation:

Types can potentially be upcasted when combined with other types, meaning they are promoted from the current type (e.g. int to float).

Upcasting works according to numpy rules:

Upcasting is always according to the numpy rules. If two different dtypes are involved in an operation, then the more general one will be used as the result of the operation.

To understand how the numpy rules are applied you can use the function find_common_type, as below:

res = np.find_common_type([bool, np.bool], [np.int32, np.int64])
print(res)

Output

int64
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
0

When you do df.loc[0] it converts into an pd.Series, as shown below:

>>> type(df.loc[0])
<class 'pandas.core.series.Series'>

And now, Series will only have a single dtype. So it coerces the booleans to integers.

So the way to fix is to use df.loc[[0]] if you are trying to get the rows:

>>> type(df.loc[[0]])
<class 'pandas.core.frame.DataFrame'>

But in this case, you need to create 2 new empty rows then add the values with df.loc[[...]] because df.loc[[...]] is only for indexing, you can't assign new rows with that.

So here is how you can get the rows with df.loc[[...]]:

>>> df = pd.DataFrame({'a': [1,2], 'b':[3,4]})
>>> df.loc[0]
a    1
b    3
Name: 0, dtype: int64
>>> df.loc[[0]]
   a  b
0  1  3
>>> 

Here you see the difference, the first code converts to a Series with only one dtype whereas the second code gives a DataFrame.

But for this case you can't use the df.loc[[...]], since you can't assign things with that, so you only can go with creating new empty rows then using df.loc[[...]]:

>>> df = pd.DataFrame({'a': [1,2], 'b':[3,4]})
>>> df
   a  b
0  1  3
1  2  4
>>> df.loc[2] = [3, 4]
>>> df
   a  b
0  1  3
1  2  4
2  3  4
>>> df.loc[3] = 0
>>> df
   a  b
0  1  3
1  2  4
2  3  4
3  0  0
>>> df.loc[[3]] = [True, False]
>>> df
      a      b
0     1      3
1     2      4
2     3      4
3  True  False
>>> 
U13-Forward
  • 69,221
  • 14
  • 89
  • 114