Ifelse statement within R's summarize function: dplyr

Question

I'm trying to count the number of visits a provider has conducted if the visit meets a qualification in R. In the commented out phrase, I can get each provider with the correct number of total visits, but when I try to set an if statement, I'm getting the provider repeated multiple times and not the correct visit count.

TeleHealth_Counts %>%
  group_by(TeleHealth_Counts$`Visit Provider`) %>%
  summarize(Video_Count = ifelse(`Type` ==  "Video Visit New", NA, sum(`Visit Count`, na.rm = TRUE)))
  #summarize(Tele_Count = sum(`Visit Count`, na.rm = TRUE))

The other issue I'm facing is that when I assign this code to a variable so I can download the data, I'm getting an error: summarise() regrouping output by 'TeleHealth_Counts$Visit Provider' (override with .groups argument). How do I overcome this error or download the data frame I'm seeing in my console?

I've tried assigning it to a variable, to Tele_Count and to the data frame df_phys with the code below.

physicians <- unique(TeleHealth_Counts$`Visit Provider`)
df_phys <-data.frame(physicians)

The `.groups` thing is a warning, not an error. While it *can* be ignored (it is not breaking/stopping execution), it is safer to be explicit by adding `.groups=` to your summarise, like the warning suggested. See `?summarize` for details. As for repeated providers ... it could be anything. If you post sample data that reproduces that problem, we can help. (See https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info) — r2evans, Oct 10 '20 at 00:49

Ronak Shah · Accepted Answer · 2020-10-10T00:55:53.650

3

Type == "Video Visit New" creates a vector of length same as number of rows in the group and ifelse returns the output same length as the condition we are checking, hence it repeats the rows.

Try the following :

library(dplyr)

result <- TeleHealth_Counts %>%
  group_by(`Visit Provider`) %>%
  summarize(Video_Count = if(any(`Type` ==  "Video Visit New")) NA_real_ 
                          else sum(`Visit Count`, na.rm = TRUE))

The message that you are receiving is a warning and not an error which is safe to ignore since it's a default behavior in dplyr 1.0.0 onwards unless you silence the warning. To create a csv file of the above dataframe you can use write.csv like :

write.csv(result, 'result.csv', row.names = FALSE)

edited Oct 10 '20 at 00:55

answered Oct 10 '20 at 00:48

Ronak Shah

377,200
20
156
213

RonakShah, do you work `NA_real_` into your flow? While many (tidyverse) tools will be fine with the distinction, I've found other tools to be confused with a `lgl` column when something I did produced nothing summarizable, so `NA` is never case to `NA_real_`. – r2evans Oct 10 '20 at 00:52
1

I usually use only `NA` (mostly because I don't remember to use `NA_real_/NA_integer_`) unless those `tidyverse` function "shout". But I think it is always better to be explicit. – Ronak Shah Oct 10 '20 at 00:55
Even if they don't complain, I think it may also be "declarative" in a sense, declaring what I think I should be getting out of this. Thus is the need for many of the `map_*` *typed* list-application functions in `purrr`, as well as `vapply` (base R). Thanks. – r2evans Oct 10 '20 at 00:58

Ifelse statement within R's summarize function: dplyr

1 Answers1