2

I'm trying to add two DataFrames together in Python having first set their index column to equal one of the existing columns.

Using the top rated method in the following thread gives an error though:

(see- Adding two pandas dataframes)

Here is a simple example of the problem:

import pandas as pd
import numpy as np

a = np.array([['A',1.,2.,3.],['B',1.,2.,3.],['C',1.,2.,3.]])
a = pd.DataFrame(a)
a = a.set_index(0)

a 

     1    2    3
0               
A  1.0  2.0  3.0
B  1.0  2.0  3.0
C  1.0  2.0  3.0

b = np.array([['A',1.,2.,3.],['B',1.,2.,3.]])
b = pd.DataFrame(b)
b.set_index(0)

b

     1    2    3
0               
A  1.0  2.0  3.0
B  1.0  2.0  3.0

df_add = a.add(b,fill_value=1)

And the error:

Traceback (most recent call last):

  File "<ipython-input-150-885d92411f6c>", line 1, in <module>
    df_add = a.add(b,fill_value=1)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 1234, in f
    return self._combine_frame(other, na_op, fill_value, level)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 3490, in _combine_frame
    result = _arith_op(this.values, other.values)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 3459, in _arith_op
    return func(left, right)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 1195, in na_op
    result[mask] = op(xrav, yrav)

TypeError: must be str, not int

Any help on preventing this problem would be greatly appreciated.

user8188120
  • 883
  • 1
  • 15
  • 30

1 Answers1

0

Problem is in defined DataFrame - all data are converted to strings in 2d numpy array:

a = np.array([['A',1.,2.,3.],['B',1.,2.,3.],['C',1.,2.,3.]])
print (a)
[['A' '1.0' '2.0' '3.0']
 ['B' '1.0' '2.0' '3.0']
 ['C' '1.0' '2.0' '3.0']]

Solution is remove strings values and specify index by list:

a = np.array([[1.,2.,3.],[1.,2.,3.],[1.,2.,3.]])
a = pd.DataFrame(a, index=list('ABC'))

b = np.array([[1.,2.,3.],[1.,2.,3.]])
b = pd.DataFrame(b, index=list('AB'))

df_add = a.add(b,fill_value=1)
print (df_add)
     0    1    2
A  2.0  4.0  6.0
B  2.0  4.0  6.0
C  2.0  3.0  4.0

Or convert DataFrames after setting index to floats:

a = np.array([['A',1.,2.,3.],['B',1.,2.,3.],['C',1.,2.,3.]])
a = pd.DataFrame(a)
a = a.set_index(0).astype(float)

b = np.array([['A',1.,2.,3.],['B',1.,2.,3.]])
b = pd.DataFrame(b)
b = b.set_index(0).astype(float)

df_add = a.add(b,fill_value=1)
print (df_add)
     1    2    3
0               
A  2.0  4.0  6.0
B  2.0  4.0  6.0
C  2.0  3.0  4.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252