1

I have two Pandas DataFrames, with mostly different data:

err_df =
            2        3        11        13         14         16
4   122.153000  56.3023  21.2722  71.79590   81.63212        NaN   
8    70.967800  19.5768  69.9780  21.11050  116.89777        NaN   
12   70.659100  19.5768      NaN  39.46288   70.62480  70.597850
16   19.237067      NaN      NaN  18.93980   18.60660  19.104767
20   19.349440      NaN      NaN  19.38080        NaN  36.785533
24         NaN      NaN      NaN  17.92060        NaN        NaN 

temp_df =
         2        3        11       13       14       16
4   89.5488  122.153  121.957  122.153  122.153      NaN
8   89.5488  122.153  121.957  122.153  122.153      NaN
12  89.5488  122.153      NaN  122.153  122.153  122.153
16  89.5488      NaN      NaN  122.153  122.153  122.153
20  89.5488      NaN      NaN  122.153      NaN  122.153
24      NaN      NaN      NaN  122.153      NaN      NaN  

I want to calculate the Root Mean Squared Error (RMSE) between the columns of both the DataFrames and store the results in a 3rd DataFrame. I know how to calculate the RMSE for an individual column, let's say 2:

print(((err_df[2] - temp_df[2])**2).mean()**0.5)
result = 48.2427158719

There's no trouble with the NaN characters either - they're ignored, which is a relief because I think using sklearn's mean_square_error function gives this error ValueError: Array contains NaN or infinity.
Basically, I want to be able to calculate the RMSE values "dynamically" and not having to change the columns each time I run the main program.
The 3rd DataFrame that holds the results should look something like this:

df3 =
              2        3        11        13         14         16
0   48.2427158719  "RMSE"    "RMSE"    "RMSE"     "RMSE"     "RMSE"  

How do I achieve this?
Any help is appreciated. Thanks in advance :)
(Using Ubuntu 14.04 32-Bit VM and Python 2.7)

user136819
  • 205
  • 7
  • 21

1 Answers1

2
((err_df-temp_df)**2).mean(0)**0.5
Out[318]: 
2     48.242716
3     91.978382
11    80.122548
13    92.792388
14    61.332234
16    82.793873
dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Thanks. I didn't realise that it was a simple case of removing `[2]`. And how do I orient the view to look like the desired output shown in my question? Apparently there's no `orient='index'` keyword in `pd.DataFrame()` like it is in `pd.DataFrame.from_dict()` – user136819 Nov 12 '17 at 20:44
  • @user136819 you can add `to_frame().T` in the end – BENY Nov 12 '17 at 21:07