0

I'm a new in R-programming for data analysis.

I trying to create my project with dataset name "all_trip_v2" from public datasets

Preview of my dataset

I aim to create a barchart to show only top 10 of Total count of each "start_station_name" and show in a bar chart with ggplot2 + geom_bar() and show the proportion of member type(member_casual)

I run this code

ggplot(all_trips_v2, aes(start_station_name,
                         fill = member_casual)) + 
  geom_bar()

The result from the code

As you can see, The result have a lots of bar grouped by "start_station_name". I just need to filter only top 10 count of start station name. Please give me some advice. Thank you so much.

I expected to create a bat like this

Expected bar chart.

  • 1
    It's not a good idea to show dataframe with a picture, please use `head(data)` or `dput(data)` and copy console output and paste them here. Also see [this](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) . – Reza Esmaeelzade Apr 03 '23 at 10:32

1 Answers1

0

I don't know of a good way to directly do this in "one step", but it should be easier to follow done in two steps anyway. Step 1 = summarize your dataset by count, and Step 2 = filter dataset to include first X rows.

Here's an example with the chickwts built-in dataset

library(ggplot2)
df <- chickwts
ggplot(df, aes(feed)) + geom_bar() +
    theme_classic()

enter image description here

To only draw the top 3 bars, you could do the two-step process:

library(dplyr)
library(tidyr)
# STEP 1: summarize by feed count & arrange
df_counts <- df %>%
  count(feed) %>%  # creates column n with counts for feed
  arrange(-n)      # arrange descending by n

# STEP 2: plot with a filtered dataset
ggplot(df %>% dplyr::filter(feed %in% df_counts$feed[1:3]),
  aes(feed)) +
  geom_bar() + theme_classic()

enter image description here

For OP's case, maybe the following would work?

# STEP 1
all_summary <- all_trips_v2 %>%
  count(start_station_name) %>% arrange(-n)

# STEP 2
ggplot(
  all_trips_v2 %>%
    dplyr::filter(start_station_name %in% all_summary$start_station_name[1:10]),
  aes(start_station_name, fill = member_casual)) + 
  geom_bar()
chemdork123
  • 12,369
  • 2
  • 16
  • 32