Subset by increasing count over time in R

Question

I have a data frame that contains count of sales by seller, year and month, called sales_by_month:

library(tidyverse)
sales_by_month <- tribble(
~Seller,      ~Year,    ~Month,   ~Sales,
"John Doe",    2018,    1,       82,
"John Doe",    2018,    2,       72,
"John Doe",    2018,    3,       42,
"Sally Jane",  2018,    1,       25,
"Sally Jane",  2018,    2,       77)

I need to subset this dataset by only those sellers where their sales are increasing over time, and I cannot figure out how to do it.

The resulting subset dataset should contain;

Seller      Year    Month   Sales
Sally Jane  2018    1       25
Sally Jane  2018    2       77

Because Sally's sales are increasing, while John's sales are decreasing.

Any help would be very much appreciated!!

Hi, It's helpful to include data in a form that we can easily put into our environment to work with. You can take a look at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example to see how to make a good reproducible example! — Sahir Moosvi, Jan 08 '19 at 18:24
@sahir thanks for the link, I will work on adding a reproducible example! — Caitlin, Jan 08 '19 at 18:28
@ricardo The dataset is static, I don't need to manage dynamic changes! — Caitlin, Jan 08 '19 at 18:28
Can we assume the data is sorted? If so you could use `df[as.logical(ave(df$Sales, df$Seller, FUN = function(x) !any(x - shift(x) < 0, na.rm = TRUE))), ]` — Mike H., Jan 08 '19 at 18:31
How do you want to determine increasing? Just if the sales in the earliest month is greater than the latest? What if they drop in the middle but pick up again? — Sahir Moosvi, Jan 08 '19 at 18:33

Darren Tsai · Accepted Answer · 2019-01-08T18:52:27.197

First, make a variable representing the difference in Sales(I name it dif). If dif < 0, it means decreasing values exist in somebody's sales.

library(dplyr)

df %>% arrange(Seller, Year, Month) %>%
  group_by(Seller) %>%
  mutate(dif = c(0, diff(Sales))) %>%
  filter(all(dif >= 0)) %>%
  select(-dif) # drop dif

#   Seller     Year Month Sales
#   <fct>     <int> <int> <int>
# 1 SallyJane  2018     1    25
# 2 SallyJane  2018     2    77

More concise:

df %>% group_by(Seller) %>%
  arrange(Seller, Year, Month) %>%
  filter(all(c(0, diff(Sales)) >= 0))

Sahir Moosvi · Answer 2 · 2019-01-08T18:53:53.590

How to do this really depends on how you want to define increasing over time. One way to define increasing over time is if there is a month over month increase. My solution just looks at if the last month had an increase but this could be changed to look at it different ways:

We calculate the change every month. We filter for just the last month and if it was a positive change. Then we pull out unique Seller names.
We filter for the seller names we obtained in part 1.

The code below does this as well as a dataframe we can load straight into R

library(tidyverse)
sales_by_month <- tribble(
~Seller,      ~Year,    ~Month,   ~Sales,
"John Doe",    2018,    1,       82,
"John Doe",    2018,    2,       72,
"John Doe",    2018,    3,       42,
"Sally Jane",  2018,    1,       25,
"Sally Jane",  2018,    2,       77)


increased_from_last_month <- sales_by_month %>% 
  group_by(Seller) %>% 
  arrange(Seller, Year, Month) %>% 
  mutate(change = Sales - lag(Sales, default = 0)) %>% 
  summarise_all(last) %>% 
  filter(change > 0) %>% 
  pull(Seller) %>% 
  unique()


sales_by_month %>% 
  filter(Seller %in% increased_from_last_month)

Subset by increasing count over time in R

2 Answers2