1

I have a dataframe called sales.

I would like to sum each month of sales by the common identifier, ID.

What I am finding hard, is how to do this conditional on there being a return value.

For example, when I sum sales_Feb1980 for ID 2, only the observations with returns should be included and the ones without should be ignored?

What the data looks like.. I have many more rows and columns.

   id    name  Returns_jan1980 Returns_feb1980 Sales_Jan1980 Sales_Feb1980   
  2b    LA         4.7%            5.6%             100              100
  2b    LA         5.7%                             250              100
  2b    LA         6.3%            5.8%             50               100
  5a    NY                         5.6%             30               200
  5a    NY         5.7%            3.6%             50               100

Correct output

  id   name Sales_Jan1980 Sales_Jan19080   
  2       LA     400          200        
  5       NY     50           300   

Any help is appreciated.

CISCO
  • 539
  • 1
  • 4
  • 14

2 Answers2

0

Using base R, we can first find out "Returns" and "Sales" columns and assuming they are in correct order we can turn the empty "Return" columns to 0 in their corresponding "Sales" column and then use aggregate to group by id and Name and take sum.

return_cols <- grep("^Returns", names(df))
sales_cols <- grep("^Sales", names(df))    

df[sales_cols][df[return_cols] == ""] <- 0
aggregate(cbind(Sales_Jan1980, Sales_Feb1980)~id + name, df, sum)

#  id name Sales_Jan1980 Sales_Feb1980
#1 2b   LA           400           200
#2 5a   NY            50           300
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

One dplyr and tidyr possibility could be:

df %>%
 rowid_to_column() %>%
 gather(var, val, -c(id, rowid, name)) %>%
 arrange(rowid, sub(".*_", "", var)) %>%
 group_by(rowid) %>%
 mutate(val = ifelse(is.na(lag(val, default = first(val))), NA, val)) %>%
 ungroup() %>%
 spread(var, val) %>%
 group_by(id, name) %>%
 summarise_at(vars(starts_with("Sales")), sum, na.rm = TRUE)

  id    name  Sales_Feb1980 Sales_Jan1980
  <chr> <chr>         <dbl>         <dbl>
1 2b    LA              200           400
2 5a    NY              300            50

It sets to NA the values in Sales columns based on values in Returns columns and then performs the sum.

tmfmnk
  • 38,881
  • 4
  • 47
  • 67