Reformatting bar graph in R

Question

For an assignment, I need to visualize the market value of companies split into groups indicating industries. I have created the following graph: Market Value of Equity graph, but the graphs are not allowed to be this colored in academic articles. The code is used is as follows:

ggplot(data = g, aes(x=g$MarketCap, group = g$SIC, fill=SIC)) +
  geom_histogram(position = "dodge", binwidth = 1000) + theme_bw() + xlim(0,5000) +
  labs(x = "Market Value (in Millions $)", title = "Market Value per Industry")

I tried to find an alternative way to display this, but I've found nothing. Another way is to change the colors of all bars into grey, but then they become indistinguishable. Anyone who knows how to fix this? Many thanks in advance..

Patudb, you can utilise one of the many (other) colour palettes or assign colours directly, if the aesthetics - and I agree the default ggplot colours are terrible. Alternatives, you can use facets to create multiples of your barchart for the different industries (or the respective grouping variable you choose for the facet). — Ray, May 13 '21 at 09:46
To provide a concrete example of @Ray's solution, replacing `theme_bw()` with `scale_fill_grey()` will give you what you want. `theme_xxx` affects the plot's "furniture", not the data display. — Limey, May 13 '21 at 09:54
There's a package `ggpattern`, see [this answer](https://stackoverflow.com/a/63091901/8245406). Or [here](https://stackoverflow.com/questions/62393159/how-can-i-add-hatches-stripes-or-another-pattern-or-texture-to-a-barplot-in-ggp). — Rui Barradas, May 13 '21 at 10:04
I can change the colors indeed, but I'm afraid this looks too messy. Alternatively, I tried to create a data table containing the average market caps per industry, but this does not work. I tried the following code, but then it gives a mean value of market capitalization for all industries that is equal: `MarketCapIndustry <- g %>% group_by(g$SIC) %>% summarise(MeanMarketCap = mean(g$MarketCap))`. This results in a mean market cap that is equal for every industry, which is clearly incorrect. Anybody knows how to fix this? Because this makes plotting a lot easier I guess? — Patudb, May 13 '21 at 11:11

score 0 · Accepted Answer · answered May 13 '21 at 14:07

Patubd, there is a lot going on and I am afraid that the comments will not suffice to get you going. Thus, I try to point out a few things here.

You are not providing a reproducible example. Thus, I "simulate" some data upfront. You can adapt this to your liking.

In your ggplot() calls you refer to the g dataframe. There is no need to then use the explicit g$variable notation.

You do the same in your MeanMarketCap pipe. I guess that is part of the problems you face.

data:

library(dplyr)
set.seed(666)   # set seed for random generator
# ------------------- data frame with 60 examples of industry group SIC and MarketCap
df <- data.frame(
   SIC        = rep(c("0","1","2"), 20)
  , MarketCap = c(rep(50, 30), rep(1000, 15), rep(2000, 10), rep(3000, 5))
)
# ------------------- add 15 random picks to make it less homogenuous
df <- df %>% 
   bind_rows(df %>% sample_n(15))

(I) "less colourful" and/or facets

fig1 <- ggplot(data = df, aes(x=MarketCap, group = SIC, fill=SIC)) +
    geom_histogram(position = "dodge") + 
#------------- as proposed to make graph less colourful / shades of grey ---------
    scale_fill_grey() + 
#---------------------------------------------------------------------------------
    theme_bw() + xlim(0,5000) +
    labs(x = "Market Value (in Millions $)", title = "Market Value per Industry")


# make a 2nd plot by facetting above
# If the plot is stored in an object, i.e. fig1, you do not have to "repeat" the code
# and just add the facet-layer
fig2 <- fig1 + facet_grid(. ~ SIC)

library(patchwork)   # cool package to combine plots
fig1 / fig2          # puts one plot above the other

With a facet you break out the groups. This supports side-by-side analysis ... and the colouring of the group is less important as this is now part of the facetting. But you can combine both as shown.

(II) summary mean

Your code will work, if you do not use the df$variable notation. This breaks the group-by call and you refer to the full data frame.

df %>% 
   group_by(SIC) %>% 
   summarise(MeanMarketCap = mean(MarketCap))

This yields with the - simplistic simulated - data:

# A tibble: 3 x 2
  SIC   MeanMarketCap
  <chr>         <dbl>
1 0              858.
2 1              876.
3 2              858.

To show distributions one can use boxplots. Boxplots work with the inter-quartile spread (25th-75th percentile and the median [50th percentile].
You can use geom_boxplot() for this. ggplot will take care of the statistical calculation.

df %>%
   ggplot() +
   geom_boxplot(aes(x = SIC, y = MarketCap)

With your data (more varied data points) the plot will look a bit more impressive. But you can already clearly see the difference in the median across the example industries, SIC.

If you feel like you can add your data points with geom_jitter().

Hope this gets you started. Good luck!

Awesome explanation! It really got me started, thank you so much!! — Patudb, May 14 '21 at 07:21
Great. I am happy that it helped. Think about closing this post by accepting the answer and/or post your solution to complement and help others that come here in the future. — Ray, May 14 '21 at 16:01
Yes I will, but I may have one more small question. The x-axis limits overwrite each other in the facet_grid plot, since there are 9 small plots. Is there any way to fix this? — Patudb, May 17 '21 at 07:23
You can increase the spacing between the panels with a theme() layer argument, e.g. `theme(panel.spacing.x = unit(5, "mm"))`. Alternatively, you can add some padding to the upper and lower limit of the x-axis with a scale() call: `scale_x_discrete(expand=c(0.5, 0.5)`. The 3rd option is to change the orientation of the labels: `theme(axis.text.x = element_text(angle=-90, vjust=0.5))`. — Ray, May 24 '21 at 08:07

Reformatting bar graph in R

1 Answers1