1

I have a data frame where in one column named "City" there are more than 50 different cities and if I plot a bar graph using city then it gets very difficult to read the plot.

Is there any way to first use count() to count the number of cities and then select top 15 cities based on how many time they appear in the data and after that using ggplot() plot a bar graph.

2 Answers2

2

To keep the rows for top 15 Cities you can do -

library(dplyr)

df %>%
  count(City) %>%
  slice_max(n = 15, n) %>%
  left_join(df, by = 'City') -> res

res

Or in base R -

res <- subset(df, City %in% tail(sort(table(City)), 15))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

We can also do

library(dplyr)
res <- df %>%
   group_by(City) %>%
   summarise(n = n()) %>%
   slice_max(n = 15, n) %>%
   left_join(df, by = 'City') 
akrun
  • 874,273
  • 37
  • 540
  • 662