Patubd, there is a lot going on and I am afraid that the comments will not suffice to get you going. Thus, I try to point out a few things here.
You are not providing a reproducible example. Thus, I "simulate" some data upfront. You can adapt this to your liking.
In your ggplot()
calls you refer to the g
dataframe. There is no need to then use the explicit g$variable
notation.
You do the same in your MeanMarketCap
pipe. I guess that is part of the problems you face.
data:
library(dplyr)
set.seed(666) # set seed for random generator
# ------------------- data frame with 60 examples of industry group SIC and MarketCap
df <- data.frame(
SIC = rep(c("0","1","2"), 20)
, MarketCap = c(rep(50, 30), rep(1000, 15), rep(2000, 10), rep(3000, 5))
)
# ------------------- add 15 random picks to make it less homogenuous
df <- df %>%
bind_rows(df %>% sample_n(15))
(I) "less colourful" and/or facets
fig1 <- ggplot(data = df, aes(x=MarketCap, group = SIC, fill=SIC)) +
geom_histogram(position = "dodge") +
#------------- as proposed to make graph less colourful / shades of grey ---------
scale_fill_grey() +
#---------------------------------------------------------------------------------
theme_bw() + xlim(0,5000) +
labs(x = "Market Value (in Millions $)", title = "Market Value per Industry")
# make a 2nd plot by facetting above
# If the plot is stored in an object, i.e. fig1, you do not have to "repeat" the code
# and just add the facet-layer
fig2 <- fig1 + facet_grid(. ~ SIC)
library(patchwork) # cool package to combine plots
fig1 / fig2 # puts one plot above the other
With a facet you break out the groups. This supports side-by-side analysis ... and the colouring of the group is less important as this is now part of the facetting. But you can combine both as shown.

(II) summary mean
Your code will work, if you do not use the df$variable
notation. This breaks the group-by call and you refer to the full data frame.
df %>%
group_by(SIC) %>%
summarise(MeanMarketCap = mean(MarketCap))
This yields with the - simplistic simulated - data:
# A tibble: 3 x 2
SIC MeanMarketCap
<chr> <dbl>
1 0 858.
2 1 876.
3 2 858.
To show distributions one can use boxplots. Boxplots work with the inter-quartile spread (25th-75th percentile and the median [50th percentile].
You can use geom_boxplot()
for this. ggplot
will take care of the statistical calculation.
df %>%
ggplot() +
geom_boxplot(aes(x = SIC, y = MarketCap)
With your data (more varied data points) the plot will look a bit more impressive.
But you can already clearly see the difference in the median across the example industries, SIC.

If you feel like you can add your data points with geom_jitter()
.
Hope this gets you started. Good luck!