-1

I have this dataframe called Revenue where some dates, cities and revenues are illustrated.

Date                      City               Revenue
1989-02-25                LA                 50
1989-02-25                NY                 72
1989-02-25                PAR                65
1989-02-25                ROM                71
1989-02-26                NY                 82
1989-02-26                BAC                73
1989-02-27                TOK                55
1989-02-27                BTH                83
1989-02-27                PAR                69
1989-02-27                NY                 70
1989-02-28                NY                 45
1989-03-01                HEL                95
#With 7000 more rows

What I'm trying to do is to select dates which occurs four times, in this example above 1989-02-25 and 1989-02-27 and so forth. The tibble should look something like this:

Date                      City               Revenue
1989-02-25                LA                 50
1989-02-25                NY                 72
1989-02-25                PAR                65
1989-02-25                ROM                71
1989-02-27                TOK                55
1989-02-27                BTH                83
1989-02-27                PAR                69
1989-02-27                NY                 70
#With 1251 more rows

Next step is to filter dates so only dates that has a revenue at or above 45 is included my tibble. The first rows will look like above but there should be a reduced amount of rows.

After that the tibble should be constrained by showing the lowest amount of a revenue per a date. So it looks like this (city is removed here) Revenue$city <- NULL:

Date                        Revenue
1989-02-25                  50
1989-02-27                  55
#With 57 more rows

Anyone has any ideas? Quite challenging with so many steps.

halfer
  • 19,824
  • 17
  • 99
  • 186
Henry Oufh
  • 135
  • 1
  • 1
  • 8
  • One question per question please! If you look up each of these steps separately, you'll be able to find answers. – socialscientist Aug 04 '22 at 16:21
  • Do you mean average or lowest values above 40? – dcsuka Aug 04 '22 at 16:22
  • Does this answer your question? [Counting the number of elements with the values of x in a vector](https://stackoverflow.com/questions/1923273/counting-the-number-of-elements-with-the-values-of-x-in-a-vector) – socialscientist Aug 04 '22 at 16:23
  • Does this answer your question? [how to filter data by the number of unique values in R](https://stackoverflow.com/questions/58269779/how-to-filter-data-by-the-number-of-unique-values-in-r) – user438383 Aug 04 '22 at 16:47
  • @socialscientist . Absolutely. I understand. – Henry Oufh Aug 04 '22 at 19:35
  • @dcsuka I'm looking for lowest value above 45. Will take a look at the answer below. – Henry Oufh Aug 04 '22 at 19:35

1 Answers1

2

Here is a solution that involves some grouped filtering.

df <- read.table(text = "Date                      City               Revenue
1989-02-25                LA                 50
1989-02-25                NY                 72
1989-02-25                PAR                65
1989-02-25                ROM                71
1989-02-26                NY                 82
1989-02-26                BAC                73
1989-02-27                TOK                55
1989-02-27                BTH                83
1989-02-27                PAR                69
1989-02-27                NY                 70
1989-02-28                NY                 45
1989-03-01                HEL                95") %>%
  janitor::row_to_names(1) %>%
  as_tibble() %>%
  mutate(Date = lubridate::ymd(Date),
         Revenue = as.integer(Revenue)) %>%
  group_by(Date) %>%
  filter(n() == 4,
         Revenue > 45) %>%
  summarise(Revenue = min(Revenue))

# # A tibble: 2 × 2
#   Date       Revenue
#   <date>       <int>
# 1 1989-02-25      50
# 2 1989-02-27      55
dcsuka
  • 2,922
  • 3
  • 6
  • 27