Python Pandas: Simple example of calculating RMSE from data frame

Question

Need a simple example of calculating RMSE with Pandas DataFrame. Providing there is function that returns in cycle true and predicted value:

def fun (data):
   ...
   return trueVal, predVal

for data in set:
   fun(data)

And then some code puts these results in the following data frame where x is a real value and p is a predicted value:

In [20]: d
Out[20]: {'p': [1, 10, 4, 5, 5], 'x': [1, 2, 3, 4, 5]}

In [21]: df = pd.DataFrame(d)

In [22]: df
Out[22]: 
    p  x
0   1  1
1  10  2
2   4  3
3   5  4
4   5  5

Questions:

1) How to put results from fun function in df data frame?

2) How to calculate RMSE using df data frame?

Check this: http://stackoverflow.com/questions/17197492/root-mean-square-error-in-python — Mohammad Yusuf, Dec 26 '16 at 09:27
Possible duplicate of [Root mean square error in python](https://stackoverflow.com/questions/17197492/root-mean-square-error-in-python) — Jim G., Sep 24 '17 at 15:41

piRSquared · Answer 1 · 2017-03-29T17:51:34.373

28

Question 1
This depends on the format that data is in. And I'd expect you already have your true values, so this function is just a pass through.

Question 2

With pandas
((df.p - df.x) ** 2).mean() ** .5

With numpy
(np.diff(df.values) ** 2).mean() ** .5

edited Mar 29 '17 at 17:51

answered Jan 04 '17 at 00:58

piRSquared

285,575
57
475
624

2

shouldn't it be `((df.p - df.x) ** 2).mean() ** .5` for pandas, as it's root **mean** squared error? – Zhang Tianbao Mar 29 '17 at 17:31
2

Username doesn't check out :) – Joseph Sheedy Sep 06 '19 at 23:21

score 2 · Answer 2 · answered Jun 01 '20 at 09:14

Question 1

I understand you already have a dataframe df. To add the new values in new rows do the following:

for data in set:

    trueVal, predVal = fun(data)

    auxDf = pd.DataFrame([[predVal, trueVal]], columns = ['p', 'x'])

    df.append(auxDf, ignore_index = True)

Question 2

To calculate RMSE using df, I recommend you to use the scikit learn function.

from sklearn.metrics import mean_squared_error 
realVals = df.x
predictedVals = df.p
mse = mean_squared_error(realVals, predictedVals)
# If you want the root mean squared error
# rmse = mean_squared_error(realVals, predictedVals, squared = False)

It's very important that you don't have null values in the columns, otherwise it won't work

Python Pandas: Simple example of calculating RMSE from data frame

2 Answers2

Linked

Related