-1

I have a same data set when i calculate the mean in R and python separately, different mean value is coming

For python using anaconda /Jupyter notebook and for R using RStudio

Python Code

group = matrix.groupby(['date_block_num']).agg({'item_cnt_month': ['mean']})

group.columns = [ 'date_avg_item_cnt' ]

group.reset_index(inplace=True)

group.head(10)
+------------------------------------+
| date_block_num | date_avg_item_cnt |
+------------------------------------+
| 0| 0.347168                        |
| 1| 0.324463                        |
| 2| 0.355469                        |
| 3| 0.275391                        |
| 4| 0.265137                        |
| 5| 0.283203                        |
| 6| 0.276855                        |
| 7| 0.316650                        |
| 8| 0.308105                        |
| 9| 0.290039                        |
+------------------------------------+

R Code

date_avg_item_cnt <- matrix %>% 
    group_by(date_block_num) %>% 
    dplyr::summarise(date_avg_item_cnt = mean(item_cnt_month)) %>% 
    ungroup()

head(date_avg_item_cnt %>% as.data.frame, 10)
+------------------------------------+
| date_block_num | date_avg_item_cnt |
+------------------------------------+
| 0    |     0.3471760               |
| 1    |     0.3244102               |
| 2    |     0.3555534               |
| 3    |     0.2753490               |
| 4    |     0.2652090               |
| 5    |     0.2831754               |
| 6    |     0.2768849               |
| 7    |     0.3167089               |
| 8    |     0.3081288               |
| 9    |     0.2900912               |
+------------------------------------+
alistaire
  • 42,459
  • 4
  • 77
  • 117
  • 1
    Don't know. Without [seeing the data you're using](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), we can't possibly reproduce this – camille Apr 07 '19 at 15:26
  • @camille you can download the data set from kaggle https://www.kaggle.com/c/competitive-data-science-predict-future-sales – Abdul Haseeb Apr 08 '19 at 07:34
  • If R and python calculating center tendency with different technique so the final result should have a small difference but in my case when i predict the score using python result is 0.9 while same technique i execute in R my final result is 1.2 its a huge difference any idea. its a kaggle competition i am trying to solve with both language to see the difference. – Abdul Haseeb Apr 08 '19 at 07:47

1 Answers1

-1

the roundup/down seems to be different. after dot python shows 6 digits, R shows 7 digits. so the difference results from kind of precision the interpreters are using.

TRSi
  • 1