I am trying to convert some R code(not written by me) to python code for a project. The R code uses aggregate() to do some grouped sum up, but when I try to replicate the action in Python by using .groupby, the result differs. R code yields dataframe with 479000+ rows whereas Python yields 489000+ rows.
It turns out later, I discovered that R also has this group_by() function and when used on that huge dataframe, the result is the same as what .groupby yields in Python
test <- df %>% group_by(A, B, C) %>%
summarise(D= sum(D, na.rm=TRUE), E= sum(E,na.rm=TRUE))
test <- aggregate(x=list(D= df$D, E= df$E), by=list(A= df$A, B= df$B, C=df$C),
FUN=function(x) sum(x, na.rm=TRUE))
test= df.groupby(['A', 'B', 'C'],as_index=False)['D', 'E'].agg('sum')
I am almost surely confident that I didn't mess up on the coding, since the project I work on put emphasis on confidentiality, the above codes is the best I can provide, sorry in advance.
It is obvious that at least one of these approaches are wrong and two functions must operate in different ways to cause that. I would really wish to know the difference between them and which function is correct in achieving what I need(namely). Thank you very much.