0

I have the following dataframe:

tibble(year = c("2020Q1", "2020Q2", "2020Q3", "2020Q4", "2020END","2020Q1", "2020Q2", "2020Q3", "2020Q4", "2020Q1","2020Q2", "2020Q3", "2020Q4", "2020END"), 
       website = c("google","google","google","google","google", "facebook","facebook","facebook","facebook","youtube","youtube","youtube","youtube", "youtube"), 
       something = c(1,2,3,4,5,6,7,8,9,10,11,12,13, 14)
       
)

How can i make such a condition so that:

When Q4 and END are mentioned for the same website, then to take only the END values - but this is not the case only take Q4 values?

Desired output is the following table:

tibble(year = c("2020Q1", "2020Q2", "2020Q3","2020END","2020Q1", "2020Q2", "2020Q3", "2020Q4", "2020Q1","2020Q2", "2020Q3",  "2020END"), 
       website = c("google","google","google","google", "facebook","facebook","facebook","facebook","youtube","youtube","youtube","youtube"), 
       something = c(1,2,3,5,6,7,8,9,10,11,12, 14)
       
)

I have tried using filter() but i assume there is a better condition, maybe pivoting? Assume I am not working with dates in the year column - as it really is only the string i am searching for Q4 and END to occur

Beans On Toast
  • 903
  • 9
  • 25

1 Answers1

1

An idea,

library(tidyverse)

df %>% 
 group_by(website) %>% 
 mutate(new = sum(grepl('Q4|END', year)) == 2 & grepl('Q4', year)) %>% 
 filter(!new) %>% 
 select(-new)

# A tibble: 12 x 3
# Groups:   website [3]
   year    website  something
   <chr>   <chr>        <dbl>
 1 2020Q1  google           1
 2 2020Q2  google           2
 3 2020Q3  google           3
 4 2020END google           5
 5 2020Q1  facebook         6
 6 2020Q2  facebook         7
 7 2020Q3  facebook         8
 8 2020Q4  facebook         9
 9 2020Q1  youtube         10
10 2020Q2  youtube         11
11 2020Q3  youtube         12
12 2020END youtube         14
Sotos
  • 51,121
  • 6
  • 32
  • 66