Is there a Python pandas function similar to R's dplyr::mutate()
, which can add a new column to grouped data by applying a function on one of the columns of the grouped data? Below is the detailed explanation of the problem:
I generated sample data using this code:
x <- data.frame(country = rep(c("US", "UK"), 5), state = c(letters[1:10]), pop=sample(10000:50000,10))
Now, I want to add a new column which has maximum population for US and UK. I can do it using following R code...
x <- group_by(x, country)
x <- mutate(x,max_pop = max(pop))
x <- arrange(x, country)
...or equivalently, using the R dplyr pipe operator:
x %>% group_by(country) %>% mutate(max_pop = max(pop)) %>% arrange(country)
So my question is how do I do it in Python using pandas? I tried following but it did not work
x['max_pop'] = x.groupby('country').pop.apply(max)