0

I'm trying to divide two columns via month by amount grouped by their region.

My dataset looks like this:

month   Amount  Region
10      2       APAC
20      5       EMEA
10      3       APAC
10      4       NA
4       3       NA

I have tried the below code however it gives me incorrect answer

a <- t(aggregate(data$month/data$Amount, by=list(region=data$Region), FUN=sum))

I get the below results:

       [,1]       [,2]       [,3]      
region "APAC"     "EMEA"     "NA"      
x      "8.333333" "4.000000" "3.833333"

The desired output is:

       [,1]       [,2]       [,3]      
region "APAC"     "EMEA"     "NA"      
x      "4            "4"      3"

However it is incorrect. Can anyone tell where am I going wrong?

franiis
  • 1,378
  • 1
  • 18
  • 33

1 Answers1

0

We can first add NA level in Region, then aggregate by Region and take sum and finally divide the columns.

df$Region <- addNA(df$Region)
df1 <- aggregate(.~Region, df, sum)
df1$output <- df1$month/df1$Amount

df1
#  Region month Amount output
#1   APAC    20      5      4
#2   EMEA    20      5      4
#3   <NA>    14      7      2

Same using dplyr can be done as

library(dplyr)

df %>%
  group_by(Region) %>%
  summarise_at(vars(month, Amount), sum) %>%
  mutate(output = month/Amount)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I tried this, however i get this : Error in Summary.factor(c(4L, 1L), na.rm = FALSE) : ‘sum’ not meaningful for factors..I am using the dplyr library – Pooja Gangurde Jun 11 '19 at 11:34
  • @PoojaGangurde First change `month` and `Amount` column to numeric then. Do `df[1:2] <- lapply(df[1:2], function(x) as.numeric(as.character(x)))` and then use any of the above solution. – Ronak Shah Jun 11 '19 at 11:58
  • I changed it too numeric...still the same error...i tried changing to numeric with this as.numeric(as.character(data$month)) as well, yet the same error... – Pooja Gangurde Jun 11 '19 at 12:31
  • @PoojaGangurde Can you update your post with `dput(data)` ? – Ronak Shah Jun 11 '19 at 12:34
  • sorry m fairly new to R,is this you want structure(list(Name = c("Pooja", "Nayana", "Arhant", "Maya", "Maya"), Surname = c("Gangurde", "Gangurde", "Gangurde", "Gangurde", "Gangurde"), Title = c("Ms", "Ms", "Ms", "Ms", "Ms"), Designation = c("Analyst", "DM", "DM", "DM", "DM"), year = c(10, 20, 10, 10, 10), month = c(10, 20, 10, 10, 4), Amount = c(2, 5, 3, 4, 3), Region = c("APAC", "EMEA", "APAC", "LATAM", "LATAM")), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame")) – Pooja Gangurde Jun 11 '19 at 12:40
  • @PoojaGangurde Thank you..I updated the answer accordingly. Please check the `dplyr` answer. – Ronak Shah Jun 11 '19 at 12:42