-1

I am trying to find the five countries with the most number death before COVID using data from John Hopkins University.

My code:

    df_test <- df %>%
        group_by(region) %>%
        summarise(death = max(death)) %>%
        arrange(desc(death)) %>%
        top_n(5)

It should Be US, Brazil, India, Mexico, UK I'm getting Brazil, India, Mexico, UK, Italy

Does anyone know what may be wrong? Thanks in advance!

EDIT: Also, running my professor's data set, the max value is showing up "NA" when I run max(df$death)

Krzysztof Madej
  • 32,704
  • 10
  • 78
  • 107
lmnguyen
  • 15
  • 4
  • 1
    Welcome to Stack Overflow! The odds of you get a useful answer increase if you provide a minimal reproducible example of your data. See how to do it here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – LuizZ Nov 29 '20 at 03:34

1 Answers1

0

Use na.rm=TRUE in max() function

library(dplyr)
df <- data.frame(region = c('A', 'A', 'B' ,'C', 'A' ,'B','C','C'),
                 death = c(8,NA,6,7,9,5,4,6),
                 stringsAsFactors = FALSE)


df %>% group_by(region) %>%
  summarise(death = max(death,na.rm = TRUE)) %>%
  arrange(desc(death)) %>%
  top_n(2)

# A tibble: 2 x 2
  region death
  <chr>  <dbl>
1 A          9
2 C          7
gdevaux
  • 2,308
  • 2
  • 10
  • 19